Measurement 

in 

Today’s Schools 
third edition 



Mecisurement 

in 

Today^s Schools 


by 

C. C Ross 
Revised by 

Julian G Stanley 

PROFESSOR OF COOCATtOV 
XWTVEBSmf OF WISCONSIN 


THIRD EDITION 
MLSU- CENTRAL LIBRARY 



1S919EX 


Englewood CliffSf N, J. 
PRENTICE-HALL, INC. 




COPTEIGHT, 1941, 1947, 1954, bt 
PREXnCE-HALL, I^C 
Englewood X J 

\ll nghU reserved \o part of this book maj be repro- 
duced m anj form, b> mimeograph or aQ> other means, 
without permt-iSion m wntmg from the publishers 

LC Cat CardXo 54-«S46 


Firit prudtim 
Second printing 
Third pmnling 
Fourth printing 
Fifth printing 
Sixth printing 
SeierUh printirt^ 
Eighth printing 


May, 1954 
February, 1955 
March, ]9of 
April ;p3? 
January, 1959 
Fehrunry, I'tCO 
February, 1961 
Joratory, 19$i 


T:5 






Preface to the Third Edition 


In thjs third edition of Professor Ross’s book, the general framework of 
the earlier editions has been retained but deletions insertions, or other 
changes have been made on nearlj every page to modermze the content 
and to increase readability 

“The Statistical Anayas of Test Results,” Chapter 8 m previous edi- 
tions, has been reuntten almost completely and moved forward to become 
Chapter 3 and thus furnish needed background early in the book Fifty 
hvo-optior multiple choice instructional items in Appendix A, mth an- 
swers and explanations m Appendix A, and the rev levr of square root com 
putation in Appendix 0 were added to round out this material 

The old Chapter 12, “Practice,” has been absorbed by Chapter 11 non 
called “Motivation and Practice as Related to Testing ” The chapter on 
“‘^chool Marks" was dropped since it overlapped other portions of the 
book and is now outdated 

Chapter 17, “Some Present Trends,” and all six appendices are com- 
pletely new Appendix R, “A Simphtied Item Analysis Procedure," is based 
upon hitherto unpublished studies by the writer It sets forth, perhaps for 
the first time in an American elemenlaiy measurements book, a simple, 
illustrated, complete technique for determining the discnnunating pouer 
and difficulty of each question in a test and several characteristics of the 
test scores Appendix C, “Scoring Rearrangement (Ranking) Test Items,” 
contains a table with values preferable to those commonly used for this 
type of item Table F, “Publishers of Standardized Teste,” replaces a 
shorter hst formerly placed at the end of the final chapter 
'1 hree other additions should make the book easier to use chapter sub- 
headings in the '1 able of Contents, a List of Tables, and an Author Index 
I am indebted to a number of persons for assistance dunng the course of 
this revision Mr Gordon D Mock checked many bibhographie entries 
Professor Chester W Hams supplied excellent comments concerning the 
items in Appendix A, Professor Robert L Ebel made several helpful sug- 
gestions with reference to Appendices B and C, and Mrs Margaret T 
AWndge and Professor Eric E Gardner reviewed Chapter 3 

In particular, I am grateful to my wife. Rose S Stanley, for her pains- 
taking secretarial work and for otherwise expeditmg the revision 

Julian C SvANT-rr 



Preface to the Second Edition 


Since pubbcation of the first edition of this book, experimentation in the 
field of measurement has made considerable p^ogre^a In this rexnsion, the 
author has taken advantage of de\ elopments in this field, ^e^^^ng almost 
all the material from the first edition and including a great deal of alto- 
gether new material All bibhographies and citations ha\ e been re\n ed, a 
hst of leadmg publishers of te<Jts has been added to the final chapter 
Chapter tests and everciees mcorporated directlj mto each chapter of 
the first edition, have now been compiled into a separate workbook This 
arrangement is de'=igned to save xaluable time for the student working on 
the exercises and for the teacher correcting them 

Smcere appreciation is due Mrs Billy Whitlow Smith for considerable 
work both m the preparation of the manu^npt and m correctmg proof 
Her assistance has been invaluable at e\erj «rtage of the book’s progress. 



Picftice to tli0 First Edition 


It IS doubtless true that more progress in measurement has been made 
during the past quarter of a century than dunng aU the years preceding 
But the pattern of the measurement hooks has remained much the same 
They have been very definitely centered about subject matter The treat- 
ment has usually been organized around the comentional school subjects, 
and znach space has been dev otcd to lists and descriptions of the measurmg 
instruments a\ ailable 

Authors of these te\ts ha\e encountered obstacles mcreasingjy difficult 
to surmount The rapid increase in the number of tests and scales published 
has made it impossible to keep the books either complete or up to date 
Even the most carefully compiled list of selected tests nas likely to be ren- 
dered obsolete by the publication of better tests before the book nas off the 
press Fo'^unatefy, in recent years the appearance of rather complete and 
frequently revised bibliographies of pubhshed tests, together with critical 
evaluations, has made detailed lists and descnptions of available measunng 
instruments in textbooks no longer necessary 

Meanwhile instructors in measurement have manifested a growing dis 
satisfaction with existing texts on the subject For example the ti-pical 
class in measurement for high school teachers has consisted of persons rep- 
resenting a variety of fields, but no one person has been interested in more 
than two or three of those discussed m the textbook, the rest of the matenal 
being largely deadw ood At the same time the enormous expansion of the 
experimental literature relating to measurement has had to be considered 
in any course that is at all adequate And here the axerage book has left 
much to be desired 

Fifteen years’ expenence in teaching educational measurement to college 
classes has led the author to attempt a functional approach to the subject 
The present work is the outgrowth of this expenence The emphasis is 
therefore, not so much upon the descnption of the tools theraseh es as upon 
the multitude of problems relating to their intelligent use and interpretation 
by classroom teachers and school administrators 

It appears to the author that the time has come for a cntical apprai^l 
of measurement in today's schools and for a cnreftil 'ipirch for gencrnliri 

VII 


\nii 


PREFACE TO THE FIRST EDITION 


tions to guide both theory and practice The experimental evidence sup- 
porting these generalizations has been examined, and wherever possible 
reported m the language of the original author 

Smee the functions of measurement are much the same on all educational 
levels, the illustrations have been drawn from both the elementary school 
and the secondary school, and to some extent from college It is hoped that 
the book will be found useful to teachers, and to prospectue teachers, 
regardless of the subject or the level of instruction 
In the preparation of the book the author has incurred obligations that 
are numerous and great His first major indebtedness has been to his former 
teachers, notably Professors Edward L Thorndike and William A McCall, 
of Teachers College, Columbia University Ihe heavy obligation which the 
author owes to his co-workers in the field of measurement, upon whose 
publicationB he has freely drawn, is indicated by the numerous citations 
throughout the book The fullest co-operation of these authors and their 
pubhshers is gratefully acknowledged Special thanks are due to Professor 
A B Crawford, who has used a preliminary edition of the book at Tran- 
sylvania College and at the University of Kentucky, and who has made 
numerous constructive suggestions, and to Professor G M Ruch, of the 
United States Office of Education, who has read the manuscript, and whose 
pertinent cnticisms have been invaluable Finally, the author is indebted 
to his own students, n ho for three years have used a preliminary edition of 
the book and who have offered many suggestions that have contributed 
greatly to its improvement 


C C Ross 



Table of Contents 


Part I 

THE PPOBUiJM OF MEASUREMENT 

COA-PTER 

PAGE 

1. Measurement in the Modern World 3 

Measurement m science, 4 Measurement m education, 13 

2. The Historical Development of Measurement in 

Education 27 

Introduction, 27 Tbe history of mtelligence tests, 30 The history of 
achievement tests 38 The history of character, personality, and interest 
measurement 46 Some important publications 52 Some relatively recent 
tendencies, 56 

3 The Statistical Analysis of Test Results 60 

General consideration^, 60 Classification and tabulation, 61 Some ele* 
mentary notions concernmg quantitative data 69 Finding tbe mode tbe 
median, and the mean, 75 Measureeo/vambdiO or scatter 8} Measures 
of relationship, 85 Measures of error, 101 Summary, KM Instructional 
test items, 104 

4. The Characteristics of a Satisi-actoey Measuring 
Instrument 106 

Introduction, 106 Validity, 107 Reliability, 121 Usability, 127 Some 
generalizations regarding tbe problem of measurement, 131 


Part JI 


THE CONSTRUCTION OF 
TEACHER-MADE TESTS 


5 

6 


General Principles op Test Construction 

Plannmg the test, 140 Preparing the test, 147 Trying out the test 155 
Evaluating the test, 159 

Principles of Constructing Specific Types of Ob- 


jective Tests 

Introduction. 163 Sirapk-recall tests. 167 Completion (esls. I/O Alter- 
native-response teats 174 Multrple^boke tests, 179 Matching tests 18G 
Rearrangement tests, 190 


139 


163 



CONTENTS 


CHAPTEB . fyj 

7 The Constrlction and Use of Essai Enamiwtions 
‘ Limitations of the e-a^ examination 193 AdvantXRC^i of the (-nv ev 

amination, 196 Sugge-tion- for improving e««a\ examinations iJi 


Part 111 

THE TESTING PROGRAM 

8 Steps in the Testing Program 209 

Determining the purpose of the program 212 Selecting the appropriate 
test or test" 214 \dmini teruig the te«ts 225 Scoring the te^ts 210 
Anahzing and inttrpretmg the storee, 234 -tpplimg the re uU«, 233 !{«- 
testing to determme the miccf^s of the program, 236 Making "uitalile 
record and reports, 230 

9 The Graphical Representation of Edtjcational Dat\ 247 
The value of graphs 247 Repre«enting the record of an individual, 254 
Representing a frequenej distribution 258 Representing tiro or more 

di inbution", 2&L General suggestion* for constructing graphs, 271 

10 The Uses and Limitations of Norms 274 

Norma and Standards, 274 Raw *<ores and derived ecore* 276 The u^e 
of norms in interpreting «<:ore3 on inleUigeci"e teste 270 The use of norna* 
in iDterpretmg Korea on achievement tests 200 Methods of comparing 
intelligence and achievement, 2% The use of norms m mterpreting Korea 
CO pereoaalit> tests, 299 


Pari XV 

MEASUREMENT IN INSTRUCTION 


11 Mot^ation nnd Practice as Related to Testing 303 

The problem of motuation 303 The relation of measurement to motiva- 
tion in teaching 304 The relation of measurement to motivation in learn 
%«mAedmc.Vis«.'Av!rij^MaV««ftt/iTii«ATri»\wii%\‘cii!fna,2fto Vnctice 
effect 320 

12 Diagnosis 303 

The problem of diagiio«i3 10 education 323 The techniques of diagt\o»i' 


13 

14 


15 


CiASSincATioN and Promotion 


The nature and educational significance of hum in vambilus, S47 Tin, 
ftclivily movement, 350 llomogencoa* or ability groups 357 

Enalmation is Glidance 


The lafaniiiit .nd importance ot guidincc, 307 The ,,Ki, ot rain.uienii.it 
in guidance, 309 Guidance U a co-operative venture, 370 

Exallatios op Schools 


Tlie problem of evaluation 373 
IvMuaiire vanni* of tlie 


General pnncipic* ot 
sehool 393 


evaluation, 3S0 


317 

3f)7 

373 


CONTENTS 


XI 


CHAPTER 

PAGE 

16. Public Relations . . 400 

The problem, 400 Ordmary agencies of public information, 402 Official 
publications, 404 Report cards and letters to parents, 406 Other avenues 
of public information, 413 Mobilizing public opinion, 414 

17. Some Present Trends 416 


APPENDICES 


APFEVDIX 

A. Fifty Questions to Help You Learn Statistics . 429 

B. A Simplified Item-Analysis Procedure 436 

Preparing the items, 436 A measure of discruuination, 437 A measure of 
difficulty, 440 An illustrative analysis, 440 A discrtmmatton table, 447 
Obtaining the mean and the standard deviation, 452 A sunplified proce- 


dure for obtaining a reliability coefficient, 452 

C. Scoring Rearrangement (Ranking) Test Items 454 

D. The Computation of Square Roots . . . 456 

B . Answers to Questions in Appendix A .... 459 
F . Publishers of Standardized Tests 464 

Author Index .... ^67 

Subject Index 



List of Tables 


PAGE 


40 


42 

G2 


1 The Estimated Gnide-\ alue and Percentage Marks Assigned to an Lngli-b 

Composition by One Hundred Teachers , rr* 

2 Percentage \a!ue« Assigned to Ten Essay Examination Papers by twenty 

Four Examiners 

3 A Class P^ord for a Reading Peadme*»3 Teat 

4 Reading P^dme«s Scores from Table 3 Arranged in Order ol Si^e and Bank 

Order and Tabulated 

0 An Illustration of the Process of Making a Oroiipcil Frequency Distribution 

6 Distribution of Reading Readiness ®cores for Six hools in a Certain City 

7 The Chronological Educational and Meni il of the 20 Pupils lO an 

£jghth*Grade Class 

8 A Two-Way DistnbuUoo of Mental \ge an 1 hxluiatioual Age for an Eighth 

Grade Class 

9 An Extremely Simple Scoring Method 

10 liTe Consecutive Exploratory Steps in A».%i lOg Overall Scores to 31 

English llieinea 

11 Dividing the Four Ds from the II Category Di*tnbution of Table 10 into 

Thrw Parts 

12 The Process of Locating the Sledtan 

13 A Short Way to Compute the Mean 

14 The Process of Computing the Quartile Deviation Q 
lo A S mplified B ay to Compute the Staodard Deviation 

16 A Scatter Diagram lUustratmg Negative Correlation Between Chronological 

Age and Educational Age for 20 Eighth Graders 

17 Pretest and Midterm Scores of 43 Graduate Students on Two Teacher Made 

Objective Te«ts m Intermediate Statistics 
IS. Computation of the r Between Scores on Two Teacher Made Testa 
J9 The Computation of Uho for Each of Two Students W*ho Arranged Six Hi«- 
toncal Events in Chronological Order 

20 The \ anous V alues of Rho {p) for All Possible bums of Squared Deviations 

(XEP) for N e from 2 throu^ 10 

21 Estmaling the Coefficient of Currebtioii bv the Spearman Rank Difference 

Method 

22 A Simple Expectancy Table Based upon tl c 43 Pairs of Score* m the Table 18 

bcatlergram, for SMuch r » ^ 

23 EffecU of Constant and \ anable Errors on Certain Types of Statistics 

24 Int^rrebli^ of IntcIheeDce Test Scores and Iive-Semester Average 

ov t> ^mors (124 Boy. ICO O.rb) m a Lo* Angeles High School 

^ Ptodi«'* According to Frequenej of Use as Revealed by Two 

2#' Plan for a Teatmg program for the ElemenUry School 
^ Standardized and Sonatandardued Tests of 


87 


96 


99 


100 

103 


no 


IW 

210 


ass 5 se e kssss 


LIST or TABLES 


XIII 


TVDLE 

2S Clae^ification of Tests in The Fourth Mental Measurements Yearbook (19o3) 

29 Otis Scale for Rating Standard Tests 

30 CoJ(>-\ on Borgersrode Scale for Itatmg SUndardaed Tests 

31 A Table for Computing Months Since 1 ast Birthday 

32 Table for Equating Intelligence Quotient \ alues 

33 Eoint Standing for the First and Second Semesters for Low Ranking Fre«h 

men RTio Mere Told Their InteJhgence Test Scores as Compared with 
Tho®e WTio \\ere Isot 

S4 Distribution of Spelling Difficufties and Successful Remedies 

35 Frequencies with ^Fhich Various Proxisioas for Individual Differences TTere 

Reported in Use or in Use with Unusual Success 

36 Trends Toward Greater Provisions for Indivi lual Differences in Elemental^ 

Schools 

37 The Mam Methods of Evaluation Used by the Co operative Study of bee 

ondarj School Standards with the Weight Assigned to Each 

38 Composition of 1940 Edition of the Alpha Beta and Gamma Scales for 

Evaluating Secondary Schools 

39 Use of Tests in Evaluating Schools 

40 Summary of the Mort-Comell Score Sheet for the Self Appraisal of School 

Systems 

41 The Rank Orders of Thirteen Topics of School News According to the Inter 

ests of 5 067 School Patron* Compared with the apace Devoted to These 
Topics by Ten Newspapers 

42 The 100 Items la a Fjve-Optioo Multiple-Choice Teacher Made Test Ar 

ranged According to Discnminatmg Power 

43 Number of Evammees m High and Low Groups WTio Chose Each Option of 

Item Iso so ,,, « u » 

44 Number of Examinees in FItgf and Low Groups WTio Chose Each Option oI 


PAGE 

21 S 
220 
222 
283 
2S6 


320 

342 

3ot 

3oo 

37o 

377 

379 

396 


403 

43S 

441 


45 Number of Examinees in High and Low Groups ^Vho Chose Each Option 

of the 25 Least Discriminating Items 

46 Table for Determining Whether or Not a Given Test Item Discnminates 

Significantly Betw sen a High and a Low Group 

47 Formulas for Finding (IVl + Wa) Values at Three DifSculty Levels 

48 SD* Table for Scoring Rearrangement Items 


443 

448 

451 

455 



List of Figures 


nauRE g 

1 Test 7 from the Army Alpha 

2 Te«t 6 from the Aimj Beta 

3 A Scale for Measuring Pupils Attitudes Toward High School _ _ 

4 The Relative Amount of Relationship Repre^nted bs r a of \ anous bizea ay 

5 IBM GeneraVPurpO'e Answer Sheet 

6 An Illustration of the Procedure Followed in Scoring Te«t 3 of the Terman 

Group Test of Mental Ahdity Form A ^2 

7 A Sample Standard Test Sconog Record 233 

8 An Educational Profile for a Standardized Achievement Test 239 

9 Test Data Summary from the Cumulative Gu dance Record of the Depart- 

ment of Supems on. and Cumculum Development of the National Educa 
tion Association 241 

10 A Cvunulatwe Record m Graphical Form 242 

11 Gentile Sheet for College Men and Bomen on Allport Lindzcy \ emon Study 

o/ Volues 245 

12 Back to School United States I909-19o3 248 

13 A Rather Complex Bar Graph with High Attention Value 2oO-251 

14 Motor Buses m Operation m the United States — Fifteen Different Charts 

Based upon the Same Data 252 253 

lo Profile of a Pupil and the Sixth Grade Class of Which He Is a Member 2oo 

10 The Profile of a Tenth Grade Pupil on the California Achievement Test 256 

17 Profiles for a Student Tested m the Fifth and Sixth Grades 2o7 

IS A Histogram or Column Diagram Repre^nting the Percentage Values As- 

f gned to an Anthraetic Paper b> Forty Ta o Scorers 2o8 

19 A Histogram or Column Diagram Rcpre<K!nt ng the Distnbut on of IQ a m 

a Small Junior High School 259 

20 A Frefiuenc> Poljgon Representing the Percentage ^aluea Aligned to an 

^^lhmellc Paper by Fortj Two Scorers 2o9 

21 An \ctual Curve Compared with the Theoretical Curve of Probability 2C0 

22 A Percentile Curve Representing the Percentage \ alues Assigned lo an Anth 

nwtvc Paper by Fot\^ TwoSwcts 261 

23 K Percentile Curve Representing the Distribution of 83 IQ s m a Small Junior 

High School 25 X 

2t Negative and Positive Skewness 262 

2o liar Graph Made tm the Tj'pewntcr Showing the Di«tnbut:on of 91 IQ s m 

a Junior H gh School ^ 262 

26 Rir Graph 5Udc on the Typcwnler Showing the Percentage of Pupils of 
. ^ Graduated from High School and the Per 

centage Bho 1 nte^ H.ghSclooUul Did Not Graduate 203 

?Lbu o' Grade Seen 

tjghl, an i N me in Rn ling Cotnprehen.*iQQ 2 fvi 


LIST or FlGlRhS 


28 Prequ™cy PoIjbm, Rep,e,entins the Deslnbutme of Headmg Comprehen ““ 

ran Scores on the Ioo^aaientReii<hoETe.ts /or (he Seventh Eighfh and 

Ninth Grades of a Certain School 2 „ 

29 Total CompreheiKioa Stores on the Iona Silent Beading Tests for the Seventh 

highth and Ninth Grades opR 

30 The Learning of pree Groups Compared One mth Fu/I Knowledge of Proe 

rcss One with Partial KnowJeilge of Progre«<* and One with \o Knowiedee 
of Progress ^ 

31 Correct and Ineorrect Location of the \oniui in a Line Chart Showing Mis- 

(inn Scores on a lie’Kitng Test 267 

32 Grade Profiles for the Seventh Eighth and Ninth Grades of a Certain Junior 

High School Hade hi Connecting the ^fedta^ Scores on Each Part of the 
Stanford Achievement Test Advanced Complete Batterj Form/ 268 

33 A I me Graph Showing the Nfedtans and Quartiles for Grades Four to Nine 

Inclusive in Reading Comprehension 269 

34 The Central Tenclenci and \artahiht} in Educational Age of Grades 2S to 

9A Inclusive in a Small City School System 270 

3o The Relation Between Standard Scores Perceauie Ranks and Rensed 

Stanford Binet IQ s 290 

35 The Profiles of Tno Papds Tl’ho ^iade the Same Total Score oa a General 

Achievement Test 292 

37 A Profile Basetl Upon I.oonl Sorms 297 

38 The Influence of Knowledge of Progress upon Achievement in a College Class 318 
30 A Stud} of the Influence of Praise and Reproof upon Achievement in Fourth 

Grade and Sixth Grade Arithmetic 321 

40 The Fjie Levels of Educational Diagnosis 332 

41 Anah eis Sheet of Test 3 Metropolitan Achievement Test® Form A Anth 

mttic Fundsiaenf al« for a Fifth Grade Class in October 33o 

42 Traxler Chart of Siggested Diignoslic and Remedial Procedures m Hand 

wnting 343 344 

43 General Qualitj of 200 Secondarj Schools as Judged bj Field Committees 349 

44 Distribution of Mean Scores of Seniors in Fo^t^ Nine Colleges m Pedn-'j! 

vania on a Teot of General Academic Knowledge 3o0 

45. Distributions of Composite IQ s on Forms L and If of the Revised Stanford 
Bmet Intelligence Scales for a Standardiration Group of 2 904 Individuals 
ofCAa2tol8'icir8 3>1 

46 A S iggested Technique for Evaluating the Philoroph} of a Secondarj School 3S3 

47 Instructions for Using the Evaluative Cnlen i Developed by the Co-operative 

Stud} of Secondarj School Standards 

48 Summary of Evaluative Cnteru for the Mesliau Stcondary School 

49 An Evaluative Procedure for the Content of the Offerings in the Principal 

Subject-Matter Fields of a Secondarj Scl ooj 

50 The Computation of Three Nfeasures of (he Adeq nej of the Book Lollec 

tions m the Library of a Secondarj School 

51 Evaluative Techniques for the Librarj Semce of a Secondaij School 

52 The Computation of the Summary Score for the Gui lance Service of a h,ec- 

53 An 'EvahiSfvTTcchnique for the Qualitj of Ii stniction in a Secondary 

School 

54 A Suggested Informal Report to Parents , c ».^i 

55 A Report Card U«ed at the University of Chicago High School 


SSo 

3S0 


393 

400 

412 



Measurement 

in 

Today^s Schools 


THfRD EDITION 



PART I 


The Problem 

of 

Measurement 



1 

Measurement in the Modern World 


From birth to death almost e\ ery aspect of our dady lives is touched by 
measurement m its numerous forms At birth the record of that important 
event js carefully made according to the curse's ivatch During the ne\t 
few days measurements of the baby's weight and temperature are part of 
the daily routine of the hospital Ever afterward, whether m school or out- 
side, watches, clocks balances, thermometers money systems and other 
forms of measurement play prominent roles in the life of every human 
being 

The daily round of the typical American probably begins somewhat like 
this He rises at a certain hour by the clock, bathes m water measured 
l>3 the meter, and dresses in clothing of a standard size He begins his 
br -it with half a grapefruit sold bj the dozen and sweetened with a 
f f^sugar sold the pound He continues with a bowl of cereal 

)S of coffee, both generously mixed ^vith cream or milk sold 

k He then looks at his watch, jumps m his car, and watches 
j cter as he hurries to fais w ork, for w hich he is paid bj the hour 


lonth, or } ear 

f'r another, year in and year out he keeps this up until be 
himself to death o\cr a falling stock market measured 
ismg blood pressure, expressed m points Then the hour 
^ accurately noted, he is measured for a casket, and the 
5 IS set according to the calendar and the clock ^kfterward 
irded m the famib Bible and car\ed on hts tombstone 
s estate is figured in dollars and his wndowhvcs the re.-t 
''ic income computed in per cent 


3 


the problem or measurement 


These common experiences are charactenstic of the emphasis placed on 
measurement m the modem .vorld In fact, if all our various 
devices were suddenly destroyed, contemporary civilization would coUapse 
like a house of cards 


A. Measurement in Science 

Apparently the chief problem of man has aUva> s been adjustment As one 
uTiter puts it “The civilization of a race is simply the sum total of its 
achievements in adjusting itsdf to its environment ” ^ The form of the prob 
lem has indeed varied someuhat from time to tune, and still more has the 
method of meeting it For ages the ingenuity of man nas directed toward 
gaming practical control o\ er the universe about him At first the process 
was the uncntical procedure of trial and error This fumbling way early led 
into such blmd alleys as alchemy, astrology, and magic Later the seers 
and vase men began to attempt to put together these scattered bits of ex- 
perience and BO in the vorda of Omar Khayyam, “To grasp this sorry 
Scheme of Things entire ” Thus nas bom philosophy The nature of the 
problem bad then shifted to undersfonding the universe, rather than merely 
gaining control over it 

Scientific method About three centuries ago there arose, nith the 
experimental tcnfication by Galileo of the lan^s of fallmg bodies the 
method of modem science Since that time man’s quantitative conquest of 
nature has expanded not only into all branches of physics and chemistry 
but into biological and psychological phenomena as nell Tt is no exaggera- 
tion today to assert that science has revolutiomzed the material world in 
which we live But it has done more than this, as IVhitehead says, science 
has, “practically recoloured our mentality As a distinguished chemist 
puts it * Man’s inner and outer necessities, real or imagined, have made him 
both a Scientist and a Philosopher 

Both the content and the method of science are important The content 
of science consists of a continuously expanding body of systematized knowl 
edge, which is the product of scientific method The one constant and uni- 
\ ersal feature of science is its method of arriving at knowledge John Dewey 
as.«crts that ‘ the heart of science lies not m conclusions reached, but m the 
method of obser\alion experimentation, and mathematical reasoning by 
which conclusions are established ”« 


by edited 

1 Longmans Green & Companv 1928 

Mee" to cS^p^iy‘152'’ 3 York Tbe 

P“3e3 Balt, 
pa" 

HuWuhing Company 1938 ^ loommgton Illinois Public School 



MEASUREMENT IN THE MODERN WORLD 


^Vhat, then, is the scientific method’ Bertrand Russell suggests this 
concise formulation “The essence of the scientific method is the discovery 
of general laws through the study of particular facts In another volume 
Russell elaborates this statement * 

In arrmng at a scientific law there are three stages the first consists in obsen mg 
tlies!gmficantfacts,thesecondmamvmgatah}TX)thesis which ifitistrue would 
account for the^e facts, the third in deducting from this hj^pothesis consequences 
which can be tested bj obsen ation 


Conant's definition emphasizes continuity ^ 

Science is an interconnected senes of concepts and conceptual schemes that have 
developed as a result of expenroentation and observation and are fruitful of further 
experimentation and obreriations In this definition the emphasis is on the nord 
‘fruitful Science is a speculative enterprise The validit> of a new idea and the 
Menlf cance of a nen experimental finding are to be measured bj the consequences- 
consequences m terms ot other ideas and other expemnents Thus conceiv ed seieuce 
IS not a quest for certainty, it la rather a quest a hich is successful oulj to the degree 
that it IS continuous 

But tv hat IS the role ot measurement m scientific method’ From Bus 
sell’e three stage analysis abote it would appear that, although measu.^ 
iTient has but little if any bearing on the second stage m the scientific 
meld It IS closely related to the first and third stages Measurement 
-r ' a nsiiafiil function in determining what alleged facts really are 
falTs wcU a1to“dmg a^^ exact method ot describing them It is also 
A m the final stage of testing and verification which is usually 

indispensable m the finaumg^ experiments A critical treatise- on educa 
by means ot ^ ,^^,3 statement “Measurement is the prin- 

tional measurement begins^ that field of human endeavor from 

SieTargr^mgs . rrmodem exactitude “ The relationship is stated by 

Smart m the " ° ,,eii that our experience ol "en.e qualities m per 

Of course, it must ^ endeavor and that this quahtatii e osiKcl 

ccption sen es as the bssis completenesa assimilated m and through the higher 
of things IS m t ary mg , sciences And this assimilation is effected large t 

categones of the several ^,ch thus functions as the connecting link 

through the process of "“““/XfserenSs and which .s only a higher . e more 
between mathematics arf ‘‘Xt doubliwided process ot comparison and discrmii 
Sr ""n “on the qualitative level of expcncnce 

page,=.3« hewH.xc. Xa.r 



THE pnOBLEM OF MEASUREMENT 


Brief attention will now be git en to the relation of measuiement to each 
of the pnncipal dinsions of science The discussion mil observe the con- 
1 entional diiTsions, namelj , the “pure sciences” and the “applied sciences, 
though we should note that m his presidential address to the Amencan 
Association for the Adiancemenl of Science, ICirtley Mather emphasized 
the Ehortcotnings of these terms 

It IS significant that Tvhen •scientists toda> are philosophizing, they are more 
likeh to distmgUL.h between “fundamental research” and “technological del elop- 
jnent ’ between “pure •'cience” and “applied «cience “ The fact is, of course, 
that e\ers item of ecientific knoT\ ledge e\ er gamed, m response to whatei er moti\ e, 
has been found sooner or later to ha\e practical significance, either directly or mdi 
rectK , m human affairs 


Pure science is distinguished from appUed science primarily on the basis 
of purpose or moUie, and one division of pure science is distinguished from 
another on the basis of subject mailer Although the distinction is not 
alwaj’s clear-cut, m general it may be said that pure science aims pnmanlji 
at understanding the umver«e, whereas applied science aims at predicting 
and controlling it In Rus-eli’s words, “Science, ever since the time of the 
Arabs, has had tn o functions first, to enable us to hnow things and, second, 
to enable us to do things 

He also sajs “Science, as its name unplies, is pnmanly knowledge, 
bj com ention It is knowledge of a certain land Graduallj, howe\er, 
the aspect of science as knowledge is being thrust mto the background 
bj the aspect of science as the power of manipulating nature 
Measurement m the pli\sical sciences. Hon can the place of meas- 
urement m anj particular branch of science best be determined'^ Perhaps 
chief reliance must be placed upon the t*'»t!monj of outstanding scientists 
in the particular field and recognized historians of science Astronomy is 
doubtless the oldest and among the most highly developed of the sciences 
Although the nsc of e-rpenmental science is usuallj dated from Galileo, 
who In ed about 300 j ears ago, Bonng desenbes two important experiments 
in anroDomj which were made as far back as 2,200 jears ago In com- 
menting upon these earlj expenmenls, Bonng'* Ea>s 


It IS no mem amdent thatthe-e fust l«o mnportant astronomical exponmenls 
oil' ^ »' ■"'“svrement Measurement protndes a 

“'"="‘“'7 deSnitron in Olivers alion that can be had in no other 

r l^iSl di^inmeilt'i'o a ' measurements through 

a logical de iclopincnt to theu eoa-equenwa -Mtliout loss of their precision 


.iit« In'S'??,™ 

N-ewYorlr Simon 

I ”lS^?r«u;SriS P.Z. ZZ7 New Voih 

Tir’tiTun S’ii’iii'' 

IJ.J Lwsll., permt..,o„ of , Irton C.ntuo Croft, for 



7 


MEASUUmimT IN Tim MODERN WORLD 


---e .3 thus stated 

The more that exact measurement entere mto any branch of Smen™ „ 
hlghlj 13 that branch developed It is for this reason that Chemislrj and 
are so far in advance of Botanj and Geology rind the reason 
so much clearer notions of, for mstance an area or a weight, than of, saj msdom 
or chiv ally , IS because the former are meastiriible, the latter not It is oVfhe first 
importance ,n Science that we should uhenercr possible, obtain precise quanuS- 
tn e statements of phenomena, and thus we Bee why it is that the introduction of a 
new scientific instrument so often leads to a marked adiance in our Inoii ledge 

Of the physical sciences, phvsics is usually regarded as the most highly 
developed at the present time “ Two outstanding figures in the develop- 
ment of modern physics were Lord Kelvin in England and Max Planch 
m Germany Regarding the place of mathematics and measurement m 
physics, Lord Kelvin sajs '* 


"iVhen jeu cso measure nkstjeu are speaktag about, and caa express it in ntim 
bers, jou know soroething about it and when >ou cannot measure it, when you 
cannot express it m numbers, jour knomfedge is of a meagre and unsatisfactory 

kind Itmaj be the beginning of knon ledge, but JOuha^e scarcely m jour thought 

advanced to the stage of science 


Max Planck, regatxled as one of the leading exponents of the important 
quantum theory in physics, declares that further progress m the physical 
sciences “will depend essentially on the development and wider application 
of our methods of measurement The case for measurement m physics 
may very well rest upon the testimony of these expert witnesses, although 
practically every prominent physicist from Galileo and Kepler to Einstem 
could be called 

Measurement in the biological sciences. The history of the \anous 
branches of science indicates clearly that not only are the biological sciences 
younger than the physical sciences, but also that measurement and math- 
ematics have occupied and still occupy a less prominent place in the bio- 
logical sciences The primary reason for this is doubtless the greater sim- 
plicity of the data m the physical sciences '* 


op W Vieata^ai.SaenlificMelhod lU Phtlosophia^ BtuM and Ils Modes of Apph 
cation pages 271-272 New York irillman^rf. Inc, ^37 

« Silvio Fiala "The Experiment and Ils Role m the Theory of Knowledge / htlryt 
ovhv of Saence, 18 2o3-2o8, Juf} Rtol The highest level m the sen«e of most precise 
corresDondenee between its definilions achieved up to now is in physics, perhaps the 
oldest m natural science Phy«cy starts Irani the idea of the eiperiment ae the only 
means by which this correspondence might be achieved ^ge 254 
w Quoted by Ronold King, “Physics, Afptaphysica and Common Sense, haenliM 

F/anck,’ Where U Setenct Gsnngl, page 96 New York W W Norton <l Com 

^^'"juhan hSi mates this Flatement "ScieDces hke Empires have tbwr n«c and 

their time of flourishing though not their d««cay Naturally, the order of their me runs 
piralJd with the complexity of their aubjeot-matter The phj-sica science- 
SmScst and most stmightfomard the lir-t to start their triumphant career 

W h„i Dare 3 ThinV \ New 'iork Harper A Brothers 1931 



8 THE PROBLEM OP ME ISUREMEXT 

Certainly, howe%er, biology has progressed far beyond the stage of 
authority that existed in the Middle Ages, ohen, for example, the quest on 
of the number of teeth possessed by the horse oas the subject of heated 
debate m many contentious uritings “Apparently,” says Locy, none ot 
the contestants thought of the simple expedient of counting them, but 
tried only to sustain their position by reference to authority Locy rec- 
ogmzes three someohat orerlapping phases, or stages, in the development 
of the biological sciences namely, the descnptive, the comparative, and 
\he experraental 

Without doubt, one of the most important generalizations of modem 
science is that of e\ olution through natural selection as set forth by Charles 
Danvin near the middle of the last century, and yet it Mali be remembered 
that his work was nonmatheraatical, consisting largely of the classification 
of vast amounts of data upon nhich these epoch making generalizations 
were based In fact, as far as method goes, it was largely an extension and 
refinement of that emphasized by Aristotle more than two thousand years 
earlier 

Whitehead attnbutes the retardation of science m the Middle Ages 
largely to Aristotle’s emphasis on classification rather than measurement 
Note this statement ” 


But the biological sciences then and till our own time, ha\ e been o\ erwhehninglj 
cla8«ificatorj If onlj the schoolmen had measured instead of classifi ing how 
much lhe\ might ha\e learnt* 


That Darwin's cousin, Sir Francis Gallon, took this position is clearly 
indicated bj the following statement “Until the phenomena of any branch 
of knowledge ha^e been subjected to measurement and number, it cannot 
assume the status and dignitj of a science Gallon, therefore, proceeded 
to introduce exact measurement and mathematical calculation into the 
theory of ei olution In later jears these biometrical methods ha\e been 
greatly extended by Karl Pearson, Spearman, Fisher, and others The re- 
markable experiments of Mendel m hereditv appeared at about the same 
time, although their \alue was not recognized until about 1900 Because of 
these pioneers, knowledge of heredity has become established on a definite 
mathematical basis** E\en earlier than Mendel’s time such noted phjsiol- 
ogi«U as Mtiller, Weber, and Helmholtz had been doing much the same 
for physiology « After a survey of the deielopment of natural science from 


’•find 

" \Krrd North IMiitthead op n( pni;e-lt 

”, r..sr ’ ■' 

’•VratwRii l«.y op nf TheXKeni.U»nCompi,n Hot 


MEASLREMrNT IN THE MODERN ]]ORLD 


9 


Aristotle to Fabre, ■^\hich shows a definite trend from qualitative to quan- 
titati\ e analysis, Peattie concludes ‘ In short, what science calls for toddj 
are life histones, and ecological studies — the precise measurement of the 
environmental factors and the inter relations of orgamsms 

The \arious biological sciences have not been placed upon as definiteh 
a quantitati\ e basis as have phjsics and chemistry, largely because of th( 
nature of their data Some competent students of science think the biologi 
cal sciences have mo\ ed too far in that direction ^Vhltehead for example 
expresses regret that * biology apes the manners of physics w hile at the 
same time neglecting the umque character of its own subject matter or 
ganisms, which are incapable of analysis without the destruction of their 
essential nature The Gestalt school of psychologj in recent years has aKo 
registered a vigorous protest against the atomic conceptions of mind w hich 
experimental psychology took over from nineteenth century physics » 
Measurement m the social sciences Measurement m the social sti 
ences presents a difficult problem The social sciences are not only neii ti 
than the natural sciences but their data are more complex They stud) 
human beings, the most complex of all biological orgamsms and their 
social relatiomhips uhich are far more complicated than purely mdn idu 

'“Treenetic history of the social sciences has been described as fnllons » 

T 11, nf Anslotle riato and P) lliagoras philosoph) still embraced the 

In the dajs of Anstotie i lat j l,eeinning of the nineteenth centun 

exact natural and social sci^^ I ^ chemistrj geol 

the CKict and Iheir'Jhilosophical malm and were rapidli dei cloi> 

ogj, biologj— had already left tn p P pre«er\ing a tendency to return to 

mg their arm and sjJ^calaln c rehauluig But the social 

philosophy for an occasional i>s\chol<Mr> religion esthetics anthro- 

sciences— history ethics law fockm/m the metaphj sical cradle of Mother 
pology (such as it was) w gniei^ and learned to stand on their own feet 
Philosophy One b) tj^gh their gait and loeabulaiy continued 

In the preface to The Sevan Senfs n/ « stayer states his mayor 

thesis as follou s arise and could not 

The central theme of the ® „ „^I1 defined stmeture nith mathe- 

hax e arisen simultaneous!) upon those thnt went before tha 

maties at the bottom becommg ertabhsbed and that the social 

psychology is only now P in N,w lork Simon and Schu-lcr, 

^I^^dCulrosa Peattie Crrm to.wrir pageMS Newlo 

’“Xd North TOirehead j ur^pg^lfiO^^^^^^^^^ 

JSrXpaS'’S The Crn.ar, 

Co’X'nf of D Appleron Ccniuui Compan, 



thi problem of vusurcment 


s-ud.cs ifthej u,e to be worth; „t the .umie ottciencc, must build upon the natural 

sciences and particularly upon geology , bioloss , and psychology 

It 13 significant that the author, although a professor of economies and 

sociology, uses as title for the final chapter in the book, “Social Science 

in the Making ” a i 

Other ^sTiters take a «oinewhat more optmwstie position, and many o 
them indicate specifically the direction the de\clopment of social science 
IS taking and must take For e-^amplc, Ogbum and Goldenneiscr*® vrite 
as follows 


Attention finallj rnu«t be drawn to the increa«mg importance of statistical 
methods in the social science The extent to winch social thought and 
mil pass from the sphere of opimoo, conjecture, and contemplativ e analj sis to that 
< f fact knowledge and control will depend on tlieir permeation bj the<e scientific 
methods of measurement and statistics 


Of course there \s nothing particularlj new about this viewpoint As 
early as 179S Malthus attempted to put economics on a defimte mathe- 
matical basis when he announced his celebrated, although erroneous, prop- 
osition that "population increases m a geometrical ratio, food in an anth 
metical ratio ” A little later, Quetelet showed that the theory of probsbilitt 
could be apphed to human problems such as insurance 
Barnes traces the historj of sociology and concludes "Ihere is a general 
agreement that sociolog> can become a true science of society only m the 
degree to which it is able to appropriate and appl> those exact methods of 
measurement and aiuljeis which constitute the indispensable attributes 
of science in general ' " On the other hand, EUwood, an eminent sociologist, 
lakes a whollj difTercnt position His jKiint of view is clearly stated in these 
Nsords ” 


It would «cem to me that as we ascend in the scale of life the \new that science is 
quantiL»ti%e measurement of objective conditions becomes le<=s and le«s applicable 
not onb becau«« measurement becomes more difhcuU but because the subjective 
element pUvs a larger part Fvro if the -ubjrttivc element is capable of certiin 
measuremenU and even if it is true that whatever exists exL<iU in some quantit> 
or number neverthelt-^ it is obvious ttiat where subjective elements plav a large 
part m^urement becomes of less importinrt for icciiratc knowledge because it is 
confined to the superficul aspects of the total situatn n an<l fails to expose the nature 
of the pmcess which is being mvestig^tcl Tliw ,s rspeciaU> true vn the social sci 
‘“u r *'■ I'l*! » --cMuhry to other 


^\\ ilium F ORltimaixlAlcxaiHlciG U uw«.rt m> mat* A-f) 

more Th \\ of SltifulKwl MeOwJ isigw, "q 42 lUlti 

more The U illwus & U ilUn ComiKiiiy 192'1 * ^ 



MLASbRLMhhT IN THE MODERN }\ORLD 


11 


It seems fairly clear, therefore, that measurement and statistical analysis 
of quantitatn e data do occupj a prominent place in the social studies 
although there is no general agreement as to just nhat this place is There 
does appear, honever, to be universal recognition that the problems are 
more difficult than those presented by the earlier sciences and that their 
solution must be based at least m part on these other sciences, notabh 
psj cholog> Measurtment m psj chology vnli be considered at some length 


in later sections 

Measurement in the applieil soituces It has already been pointed 
out that the distinction hetneeii pure and applied science is not alnars 
easy to drau In the beginning science appears to hare ansen m the 
senice of certain basic human needs and desires'* Despite its humble 
origin, howcier, science soon ceased to be but a means to an end and 
became an end in itself For abuilt a century and a half olloi^g Galileo 
It became exclusivelj the piirsm. of the learned, and hardly inlluBnced the 
thoughts or habits of ordinal j men at all The einphasis had sWtedJro 
applied to pure science Russell comments upon this fact as follons 

, , io=* iiul 6 ftv veare that science has become w 

It 19 only duung ^und ed and bity ^ 

important factor m tLnVd occurred smee the doss of th 

'^oltTg^tons Om bdudred and Wtj scars of science has proved more cvplo- 
r than&sind seam of p, esc, eat, fie culture 

. .Science v\liich arose from man’s stern neccB 

The “”XLry pro^^ of life has non returned to sene 

sity for ‘■'"JT Ja nonliere m measurement more in esideiiet 

again his A competent observer ‘ asserts that one 

than in its practical app “ „ j ,^„aeavor into is Inch measurement 

“led, It IS these aPP^io. of 

the mind of the layman m ^ y ms ention such 

he IS likely to think of the ^s and chemistry that have 

as the radio or ^“‘’'^ ‘Xllpemoiis would know of Marcom, 

made them possible „ho ever heard of Hertz or Mav- 

who invented wireless ‘f E^P^yk'^ 

well, whose pioneer work blaz h t measurement in these modern 

xlie promment place .„dustiy is too well known to re 

applications of science to “^“^"“^mob.le as an avample Its mechanical 

Ire elaboration Taeamod^an^^^ 

parts ar^ accuratet ^ ^ Ita.hiCompam f" 

:;srndrs.i;?-^f-£r'^ - 

«•]) 0 th .nel Smith op at P‘gc 



THE PROBLEM OF MEASUREMENT 


Eubiected both to careful laboratory experimentation and to rigid test 
on the trial grounds The instrument board presents, as practical aids to 
the user, xanous devices tor measuring gasoline, electric current, oil pres 
sure, temperature, and speed of car. as nell as perhaps a clock and a radio 
It 13 instructive to study rthat the application of science has done tor 
modem cookery The ordinary untrained housemfe still uses recipes nitli 
sucii \ague directions as ‘ season to taste,” “add butter to the size of ^ 
^valnut,” “cook in a moderate oven,*^ and so on In contrast, the modern 
bakery accurately measures all ingredients mixes them uniformly for a 
specified length of time, and then cooks them at a specified temperature 
for a defimte tune This assures a predictable uniformity m the product, 
in contrast Viith the “luck” of the old tune cook 

Medicine is an outstanding field m which many discoveries of pure sci 
ence ha\e been applied to the solution of practical human problems 
Hemck” explains how the development and use of \ anous instruments of 
precision ha\ e revolutionized medical diagnosis and practice The measure 
ment of blood pressure, body metabohsm, and the phj steal and chemical 
analj ses of the blood and other body fluids are as recognized techmques 
today as were height, weight, and temperature a generation ago The die 
titian m the kitchen measures the patient’s food from the standpoint of 
calories, minerals and vitamin content with an accuracy approaching that 
of the pharmacist m compounding his medicines and of the nurse m ad 
mmistenng them 

Limitations of measurement Before concluding this discussion of 
the relation between measurement and science, it may be well to note some 
of the difficulties and problems of measurement It must not be assumed 
that the tools and techmques of measurement have been de\ eloped to a 
btatc of perfection This is far from true e\en in phjsics and chemistrj, 
where measurement has progressed furthest 
Planck, an eminent physicist, offers the warmng that “every number 
obtained by pbiaical measurements is haWe to a certain possible error ’ 
t\ estawaj puts the matter m these words ‘ e may, in fact look upon 
the existence of error m all measurements as the normal state of things ”” 
Doubtless, Bertrand Russell Ims the same idea in mind when he desenbes 
‘•cience as a “succession of approximations ’ ** 


In general it may be said that the sources ot error m measurement are 
due to the imperfections either in the measuring instruments themsebes 
or 111 eme m ishich they are employed While both of these source's 
<_r_ciTo^re subject to a considerable measure of control, neither can be 

, ^ PSEimt York E P Dutton 4 Co Inc 


" F ^\ b wtAwa\ op at 250-2^0 

•• Urtrand lU^vll TU Snml feO ((oof 


op nt page Co 



measurement in the modern world k 

1 The improvemeat of existing measuring instruments 

2 The devising of adequate methods of estimating or aJJomng for errors 

3 The development of skill m applying the instruments of measurement 
so as to reduce errors to a mimmum and in interpreting the results so as to 
take due account of the errors which cannot be eliminated 

The first of these methods will be considered at some length m Chapters 
4 to 7, the second will receiie attention in Chapter 3, while practically 
the entire book is concerned with the thud 

The lunitations of existing measunng instruments do not detract from 
the importance of measurement although they do add to its difficulty 
The result is rather to set a spenal premium upon the skillful use of these 
instruments As a rule, the cnider any tool is the greater the skill required 
in Its application, if satisfactorj results are to be obtained The early auto- 
mobile, for example, called for much greater stall m its successful operation 
than does its more highly perfected modern successor 

Conclusions. What, then, is the relation between measurement and 
science? A few generalizations seem fairly clear 

1 There is a direct relationship between the status of a science and the 
degree to which measurement has been developed m it In the older and 
better established physical sciences, measurement occupies a fundamen 
tal place, in the newer biological sciences, measurement occupies a less 
important place, and in the social sciences, the roost recent group meas- 
urement has made hardly more than a begmoiDg The eMdence seems 
abundantly to support Westaway’s statement ” “The more that exact 
measurement enters into any branch of Science, the more highly is that 
branch developed “ 

2 The prominence of measurement id a science appears to be roughly 
m inverse ratio to the complexity of its subject matter Inert material 
seems inherently more susceptible to mea'mrement than living organisms 
Apparently the maximum difficultj comes tn the case of man particularly 
in his social behavior 

3 All measurement is subject to errore These errors are due to limita 
tions m the tools as weU as m the techniques of measurement To ensure 
satisfactory results the greater the limitations of the former the greater 
the skill and insight called for in the latter 

B. Measurement in Education 

Is education a science’ Education 13 described sometimes as a phi- 
losophv, sometimes as a science, and sometimes as an art Moreover, i is 


F Westaway, OF ctl pages 27t 272 


14 


the problem of measurement 

other or theretTcc^„“”f “ dt*t.ngu.shed from eac 

ontrary .3 the truth, however Qutfo ' 

.“a ■” mZo71 '“‘opiated Ae ' 

&ome attp '*■ ^’*** if educatinn^ ^jifferent if educatior 

that although his studi i, example, the a\er«» presupposed by 

the rationality of the wild “rtam beliefs rrf5h™”‘j®‘ ’’“d'ly admits 

as a scientist, called upon ii i**" "ttative attainaLhIv ““dormity of nature, 
on^f *^at such justific**!^^^**^^ notions to ent ^ he is not himself, 
™ of science, or that f sul 'To” “"iuiariy reom ? examination He can 
by tho philosopher nhoM^t ttsMcaliou « demiideH “""a' carrying 

rl“ I'lf "" ■■■ ““~=S 

"“2 "- ■»« 

philosophy, both beinir*! *’ no distmefi ^ thinkers, Plato 

of truth n During theiSr^ oommon bond" f 
otmd Itself uuequaUy honever, "to n„ 'b'" 

>oter Renaissanc oL i, "■d>apilvv„v a 

heg.nn.ng with phys,! ff “»ther of the " a Since the 

busing and chemistry, left th^ft""'; ‘’‘’™ tins nmon, 

dsnbtless Perfonned a “ aort of -WeiT^ “d ^nt up m 

happens when disconte function for fh^ i'° ”®'hal weaning," which 

conr^T hlother Philosontf "“thing at all | "dependence, science 
of the ™ ““"tinned to the nr^ Prolonged ner 7° parental 

brothT'“''“"dshave so P T"‘ ^"rtunaf e "doles- 

result '^1^^PP*°ess to moth yp recent years some 

I!!l‘Jledueat.on » ""d da,.ghCa,ir^r "'"“h has 

W describes the 


'“‘“gnters »'rucn 

t-l "“.•mu, s,, 

“ror.r'-?" V™' '■• P'-op^u.uo. . ^ 

I,raiy“"«"“tEcirS ,l,^'“P'erI MeJ 

SSS?'«f'S5's.~iav 

** Journal of Edu, 


measurement in the modern world 


-/®Ij pant itntZ >”te-T>retat.0D, rti 


'“,, 1 ,^ ^ competent students ot education recogmze that both phdoso- 
-L u “ j “ce necessary for a complete act ot thinking, each field is 

j.so broad as to make a certain division of labor necessary The situation has 
Rp^n well stated as follows 


^ j IJ one specializes m the critical examination of educational theories, hypothese‘= 
^ generaiisattcna m the fight of data w/uefi are already available we call him an 
^ educational philosopher If one specializes to the solvong of educational problems 
^ appeals to expeneace through systematic, controlled and uncon 

trolled observation, in field or laboratory, we call him an educational scientist in 
^ the classical sense of the term 


f But even the most competent philosopher is forced to take his science 
^ at second hand from the data made available by specialists m science, for 
^ it IS too much to expect him at the same time to be an expert experunen- 
tahst Conversely, a competent scientist is forced to take his philosophy 
largely at second hand while he conducts his pereistent search for new facts 
^loreover, not only does a scientist have to borrow his philosophy, but 
much of his science also 

This borrowing, though necessary, is risky busmess not only for the 
philosophei but for the scientist as well Not only may he be unfortunate 
in what he borrows, but its meaning to him is uie\itably colored by the 
mental background imposed by bis specialty The important point is that 
science and philosophy are reciprocally related as inseparably linked as are 
heredity and ennroament in thegroirth of a living organic BucLinghani 
states the relationship concisely “As fields of human endeavor science and 
philosophy supplement each other Without philosophy, science is 
incomplete, without science, philosophy is barren An educational phi- 
losopher puts the relationship as follows “While philosophy must bo the 
general to plan the grand strata of education, it will need science as its 
staff officer Broadly conceived, education then is both science and phi- 
losophy As a science it belongs to the group known as the social sciences, 
whose data are the most complex of all BTiiIc education is not, and doubt- 
less never will be, as thoroughgoing a sciem e as physics or chemistry , it has 
nevertheless made more progress toward the saeiit.lio treatment of its 


Carter V Good A S Barr and Douglas E Seales Eta Vettodole,, o/Fdamtoaat 

fles«arcft mee 24 New York D Appleton Cenluiy Company U70 

« In commenting upon William James Bonog mska Ihw elwturbing obse^alion 
' It 15 too bad but no one has ever yet succeeded m being both ft good philosopher and 
a good experimentalist "EG Bormg cp cil , pig® «, > c" tm I 

«B R Buckingham ‘ The Philosophy and Organization of Rc-carch School anl 


‘^°«JohS ^ew\o^k McGriw 

fJfJJ Book Cojnpioy, Inc , 1939 



16 THE PROBLEM OF ilCASURlMENT 

nrobicms during the twentieth century than in all the ceutunes preceding 
But what of art m relation to science and philosophi ? Will Durant pu 
tie matter clearly and graphically “ 

Erers suence begins as philn.opln and en.Is as art, it ames >'^>"‘''7" "'V’ 
amis into achierement PMmopln is tlic fmnt trencVi m the siege of truth 
Science is the captured torntan , and behind it are those se, urc regions in n hicli 
knowledge and art build our iin|>erfeLt and marvelous world 

William H Paj ne long ago suggested that “in the slow but bure e\ oJution 
of human opinion, a science of educalvou vs heguinmg to emerge from the 
art of education But there is also a reciprocal relationship, for as 
Doughton” points out, “the sure foundation of an effective teaching art is 
a science of education ” 

Science consists in 1 noioiug, while art refers to doing and implies hkill and 
aesthetic excellence An outstanding teacher might be either an artist or 
a scientist, although the ideal teacher must be something of both No mat,- 
ter how great the artist or lulh how much inspiration he wnclds the brush, 
the pigments are mixed according to formula 

The place of measurement m education. ^Tiat, then, is the rightful 
place of measurement m education, which is at once a science, a philosophy, 
and an art** The answer vanes somewhat with the point of view of the 
observer Naturally the role of measurement appears more important to 
those educators whose specialty is science than to tho^e whose specially i* 
philosophj \t times these views become so divergent as to appear wholly 
irreconcilable The quotations at the top of the next page, from outstand- 
ing early educational leaders m a single institution, wnll make this clear 
It IS alwajs dangerous, o! course, to detach a statement from its context 
However, the strong language employed, eontammg such terms as “thor- 
oughly,” “indispensable,” “anal,” “fallacy,” “never,” and “always,” 
scarcelj leav es room to hope that these statements are entirely harmonious 
Doubtless, any attempt to vnlcrprct the above quotations should take into 
consideration the publication dales It will be noted that vnews of the 
rfucational -ccntista represent an earlier pencxi They appeared soon after 
« orld W ar 1, Mien the succcasot the \nny Alpha test n as still fresh in the 
mind, of psychologists llmeover, at that time, atomic physics still dom- 
raatal all science, including psjchology, and Iichataonsm Mas m the as- 
ccndancs m tincnca The rclatuely more recent quotations from educa- 
tional pliilo-oplicrs rcHcct a nener atmosphere The recognition of the 
iicip c o in clCTOiiiancy m physics has had a sobering effect on science 
ln_Re™ral, and lolloners of the Gestalt seliool of psychology in particular 

S-3 Ne, y„k S „j Selicer, 
■•■sell Ne, York 

in's,;' '"'k*'™ nsge 



measurement in the modern \yorld 


TWO VIEWS OF IIEASUBEMENT IN EDUCWION 


Educatioval Scientist 


^^hateier evists at all exists m some 
amount TokuowittLoroughlj iii\o]\es 



Sfeasuremeiit is indbiicnsable lo tlie 
growth of scientific education The 
final answer to e\erj educational ques- 
tion, except one, must be left to the 
educational measurer and must await 
the deieJopment of edu/ation as a sci- 
ence “ 


Educational Philosopher 


Vet another false concept m the climate 
gripping the American scholar fbirarts 
hisatudyofMan This is the fallacy that 
“Iknoiwonl} trbat I can de«cnbe quan 
titatively irhate\ er exists, exists in 

some measurable amount " « 

And I should myself like further to con 
dude that education can never become 
I a science alwaj s— so long as this 

world stands — mlj there be problems, 
nay regions of problems with which the 
processes of “exact' science are msuffi 
cieut to cope “ 


have protested agaiiiat what they regard as the atomic conception of mind 
It IS, tlierefore, quite passible that the present views of its two groups ore 
much clo&er together tlian the above statements would indicate However, 
the following statement from McCall strongly suggests that not all differ- 
ences have been ironed out " 


Certain (xtrtinc cx])<uients of the oigaiiisruic (often called Gezialt) view contend 
that anj organism is more than the sum of its parts, and tliat adding test scores b 
like trying to make a miui by sticking together a head a trunk, tiro arms and two 
legs But a reading score cannot be properly compared to one leg It is not a broken 
off fragment of the nund In a very real sense, a reading score tends to measure the 
entire organism functionmg in that reading situation 
Mental measurements are essentially similar to bodilj measurements If anj one 
propo'^ed to abolish the makmg and use of the measurements of pulse temperature 
blood pressure, et cetera w e would call him crazy, and if anj one proposes to abolisli 
the making and use of mental measurements, he, too, should be called — I hesitate 
to say what, since somehow I must manage to hi>e with certam of m> colleagues 
after this is published, but surely something other than an organismic philosopher 
or a Gestalt psychologist! 


Both scientists and philosophers attempt to test their generalizations 
before finally accepting them With science the process is the straight- 
forward one of subjecting all such generalizations to rigid mathematical or 


*» Edward L Thorndike SevenUenth 1 of the ^ alional Society for the Slwly of 

Education Part II, page 16 1918 ^ „ /pi c <? < — 

Harold Rugg ‘ The American Scholar Facea a Social Cmia, The Soetet frontier 

1 12 March 19d5 , .» « x f .t, 'ti.b 

‘“William A McCall How to Measure »« Education pages 7, 9 hew iork The 
Macmillan Company, 1923 ^ , ,.>,/>%« 

“William II Kilpatnck School and Soaety 30 J«l> 13 1929 
■I Tht SemhUer pul.lohed b, Ibe of PobJration/ Toacbers Col] gi 

Columbia Univtrsitj, December, 1936 



THE HROBLEM OF MEASUREMENT 


expenmental verification With pMosophy the process appears moin in 
voUed Kilpatrick for evamplc rccoimzcs tno distinct situations sim 
pie prophecies” and “decisions on appropriate conduct or policy ’ 1 or the 
former, he agrees that “ ‘verification’ man appropriate term and measure- 
ment (when available) is a proper means of testing ” For the latter, hovv 
ever, he insists tliat “ ‘vcnfication’ is not an appropriate term and tech 
mques of measurement are not m themselves adequate " He continues 
“In snch cases the function of measurement is not to «upplant or to supplv 
decisions, but to furnish regarding the working of the policy under review , 
more and better data m the light of which a fresh and better decision can 
be made ” Apparently, then, whenever actual \enflcation is possible, this 
phdosopher at aiij rite is willing to assign to measurement the job of doing 
it, and even in the other cases he assigns to it the necessary, if humble, 
duty of providing at least part of the data required Perhaps, then, it is 
not an unfair statement to say that the scientist alxcays assigns to measure- 
ment a fundamental role, whereas the philosopher sometimes does so How- 
ever, at all times the philosopher seems willing to ascribe to measurement 
an important, even if not a fundamental, place m education 
Take the important matter of guidance, for example J^veii its most 
enthusiastic supporter would hardly characterize guidance as a full fledged 
science Yet measurement provides some of the essential data m an> sound 
guidance program To desenbe a pupil as of weak scholarship and of low 
mentalitj is to leave his status vague and unsatisfactorj But to say tliat 
he has a percentile rank of 20 on the Cahf orma Achievement T ests and 
an IQ of 84 on the Revised Stanford Bmet Intelbgeuce Scale is to describe 
him in reasonably precise and meaningful terms 

In this connection it is well to observe that measurement is always a 
means to an end and never an end in itself A measurement is simplj 
a quantitative description of observe<l data Tlie signiGcance or educational 
implications of the measurement are rarely self-evident or automatic As a 
rule, tlie true significance of the measurement can be determined only when 
It 13 seen m relation to other relevant factors and is fitted mto the total 
pattern of the situation Hie term a-aluation, as distinguished from meas- 
urnynl, js often used to refer to the process of appraising the “whole” 
Chilli or the “entire” educational situation 


Tire thrM U’a in education Cverjone is familiar with the famous 
tnaity of Rs m education, “Readin’, -Rilm’, and ’Kithmetie ” These, 
‘^°i “'ucalion, the curriculum of 
n^^ t" "" " t concerned w ith the 

olovcdlhrir ' l’’”?, Educators have, in general tm 

principal methods of setthuE educational lames and of arnv mg 

lUscirh J„ ,r 



MEASUREMBUT IK THE MODERN WORLD J9 

at educahond pnnc.pta and pol.ae, Theae consl.tnle a new senes of R’s 
Rhetoric, Reputation, and Research ” ' 

Historically the first of these methods may be temied that of Rhetoric 
H IS the method par cxcellmce of politicians, although not unknown in 
education, especially among the reformers of every period The method is 
usually most dangerous when used orally It ,s too well known to reomre 
detailed discussion here Abe Martin’s famous definition of an orator as a 
public speaker not unduly hampered by the facts" indicates rhetoric’s 
I mitations The danger is that the personality of the speaker may outweigh 
the merits of the case, and the artistic form of the speech may have more 
mlluenco than its content Naturally measurement and quantitatu e data 
are usually irrelevant, if not positively in the way As a matter of fact, 
a Speaker of the House of Representatives once attributed the decline m 
oratory chiefly to the “general diffusion in knowledge," since ‘ as a rule 
the more information a man has the less emotional he is, and the orator’s 
appeal was to the eraottons far more thao to the understanding ’ 

Ihe second method of determimngeducatjonaj theory and practice may 
be termed that of Reputation According to it, the settlement of an educa* 
tional issue is the simple matter of finding out «hat has been said on the 
question by some persons whose reputations in the fie/d arc sufficientlj 
great to make them accepted aa authorities This metliod has been the 
dominant one m education until recently and is still widely used It has a 
legitimate and necessary place in education as it doe® in law and medicine 
Mb^re one must rely for the solution of many practical problems upon the 
professional judgment of acceptable authoritie® But the method is not 
without Its dangers, which are so important as to warrant brief discussion 
In the first place, the authority may be mistaken Reputation is un- 
fortunately no guarantee of reliability Until comparatively modem times 
the wisest persons were quite certain that the sun each day made a com- 
plete journey around our flat earth Such divergent viei\s on practicallj 
every phase of education as are expressed la current educational journals 
and in public addresses by our most eminent educational leaders arc ample 
assurance that men of the highest reputation may be mistaken ** In the 
second place, the authority may be misquoted A few years ago at a meeting 
of the Ameruan Psychological Association a speaker quoted what pur- 
s’ Irvm S Cobb de-Rribed a pramment Southern orator m one who can ‘ /nake a fong 
of a syllable and turn any reasonably long word into an anthcro The Couner Journal 
l,miipville fCentuckv June IS lO-iS 

«'> Champ Clark, ‘ Is Congressional Oratory a Lost Art? Ceatury 81 310 December 

''’•"Educators of course have no monopoly here Bertrand Russell rcporls the am ising 
hilt instructive e'cimnlf ofTodbuntor Ibenuthenulician who opposeil theesUibli-hm^ 
Shrfi^nrnfalliborotoryatCan^nlgo 
for students to see cvptnruenta perform^ ttnee the 

thurtn.htrs «!! of whom were men of the highest character mcluJmg manv 

clergymen m the Church a! England! 



THn PROBLEM OP MEASUREMENT 


norled to be a statement from an outstanding psychologist At the conclu- 
sion of the address, this psychologist arose to explain that he had never 
made any su-^h statement and m fact beheved quite the contrary It is too 
much to expect, tim\ever, that the authority will alnays be present to 
correct his alleged quotations A third danger is that conditions may ha\e 
changed so greatly that a statement once true may no longer be applicable 
For example, George Washington’s naming against foreign entanglements, 
made m 1797, nhen the United States consisted of 16 states ivhose total 
population, barely 5,000,000, nas separated from Europe by a broad 
Atlantic, need not be at all applicable to a nation of 48 states, with a 
population of more than 150 000,000 joined to Europe by modern agencies 
of communication The method is thus seen to be beset by many dangers 
It must, therefore, be used v. ith caution The necessity of extreme care m 
the selection of the authority cannot be overemphasized It is usually wise, 
also, to examine the evidence that lies behind the statement and the con- 
ditions under which it uas made Reputation alone must not be thought of 
as adequate assurance of reliability At best the reliance upon reputation 
involves considerable nsk It is alwajs well to consider the circumstances 
under which the statement w as made as n ell as the data upon ^ hich it n as 
based 


The third method of arn%nng at truth is that of Research This method 
18 comparatively recent in the history of man The prestige of the orator 
and rhetorician in ancient Greece and Rome and the authority of Aristotle 
m the Middle Ages testify to the newness of the method of research And it 
IS more recent in education and the other social sciences than m the phys- 
ical and biological sciences Its appeal is to the intellect and is based upon 
the facts in the case It is the distinctive method of science and may be 
regarded as the onlj final method of Betthng an educational issue A single 
illustration will indicate its superiority over the earlier methods used 


A practical problem in education is to determine the proper amount of 
time to be devoted to each subject taught Educators have usually assumed 
that the results obtained are directly proportional to the amount of time 
expended In fact, the thing seemed setf-evident In the closing years of the 
last century, houever, an inquisitive American physician by the name of 
Rirc undertook, apparently for the first time, to subject the question to 
scientific study The subject chosen nas spellmg, and the procedure was 
extremely simple and direct A umform. although not standardized, spelling 
t wt Mas administered to schools in xanous parts of the country Afteniard 
th<= rosult, 100,000 caajs. „ere tabulated according to the 

ZT I T n ‘r° ■" ‘'''= P™8ram Contrary to the 

usn at a^ mpfon, R.ce found hute nr no relation betneen the results 


'* SomnJne has *ugge«tM that 
harp bp»Ti knorkn) ouL 


thconcp often 


continue 


to live long after their brains 



HrA&URLMENT IN THE MODI R\’ II orru 21 

obtained and the time expended - Equally good spelling achievement 11a, 
found in schools where a penod of ten or fifteen minutes nas devoted to the 
subject as in those iihere a period three or four times as long ,ras alloiiL 
Although considerable skepticism nas manifested toward the Rice inquiri 
^ the beginning the evidence was so convincing as to compel assent 
Today few schools allot more than fifteen minutes a day to spellmg in the 
school program The solution of practical problems in education by the 
method of research had thus made a prorawng beginning 
Forty one years later Trier summarired the educational situation as 
loilous 


The proceedings of educational associations dunng the latter part of the nine- 
teenth centurj indicate clearly an attempt to settle teaching problems bj argument 
b} impassioned pleas, or by consensus The achievement^testmg mo\ement pro- 
Mded a neiv tool bj nhich educational proWetiw could be studied s\stcmaficailj 
in terms of more objective evidence regarding the effects produced m pupds The 
hope that problems could be settled by reference to fact rather than siiujectne 
impression or emotional^ colored opinions has probably been the strongest influ- 
ence of the achievemeut-testmg mo\ement in the past fort> one years 


Good, Barr, and Scutes** Uiissify reseurch methods in education under 
four headings, historical, normative-survey, experimental, and other meth- 
ods Of these four methods only the historical r> not dependent upon meis- 
urement m some form, and even this method is liktly to nuke use of 
numerical data 


The function of mousurement in instruciwn and in school ad> 


ministration. The foregoing discussion has been concerned primarily with 
the role of measurement in educational rescan h, or in education considered 
asascience But education is an art os weU asa science It JiasitspraeticaJ 
as well as its theoretnal aspects It is u primary purpose of tins book 
to consider such immediately practical problems as the relationship of 
measurement to the actual administration of schools and the mstruction 
of pupils an these schools Later clwpters mil consider measurement in in 


•’JobephM Rice ‘ The Tutility of the Spelling ftnnd ‘ torum 21 1R1-172 403-410 
Ann! and June 18*37 Laftr studies have found a simiJpr Jack of rptation=hip in other 
subiects See Merrill T Laton "A Survey of the Achievement in Soiiil StuJiei of 
10 220 Sixth Grade PupiJs m 464 Seboois w Indiana,’ liullehn of the bchovl o/ Lducation 
Indiana l/niversily 20 No 3 1944 68 pagea „ .on, lonn 

Rice was an important progressive education pioneer During th" yean IS'iI 1S93 
he investigated American education relentlessly and published ft sens of 20 ariiclca m 
the rorum which < ven today sound startlingly modern The first nas Need School Be 
a Blight to rhild Life’ (12 529''»35 December 15>)1) theli-l Bhy r.adicnllaic 
.Yo Professional SUndmg (27 452-46? 189 ») 

see Douglas C Scates ‘Fifty Years of Objeclm Measurement an IKebearch m L lu 
enUon •’ Journal of FJuc,t/it?naUfei>enreh 41 2II-2i4 D<^emb«r MlT „ , 

Thirty Saenlh Yenrbooi oj the Aalw„al 'iltuly of ( 

' The Scientific Moicment m Dduciition piy m Quo ed by | rmi of ih. S 
ciety Bloomington Illinois Public ^hool lhihl.shm^g Coiuo.^ 

•» Carter Y Good A S Barr, and Doujdas L fxa.te«, c; 8N- pages 



the problem of iIEASUREME\T 


^ruction iind mea^temenl m s*oo\ adnamstofon There B some 
rappVre-ong the e t™ major i™, each of tvluch wUl be subdmded 
-till further But the orgamzation is a convement one, eten if Eomemnat 


It must be admitted at the outset that up to the present time re^arch, 
while not entuelj lacking, is by no means sufficient to proic amclumdij 
that measurement reahi «enes the aboic practical functions Howeter, 
though esperuncutal endence tor the practical lalue of measurement is 
somewhat meager, obiectii e support for the new that measurement is use- 
lea, or harmful is nrtualh nonexistent The present case for measuremeirt 
m education rests to a large extent upon the testimonj of expenenced 
teachers and ‘school administrators, and the argumentati\ e ability of per- 
sons enjojing the highest reputation in the field ** This is one of the man> 
pomts m education where further research is needed 

Examples of existing experimental evidence as to the value of measure- 
ment m instruction are the following " 


fcchutte iouod that normal school students who expectwl final examinations did 
*igmficanth better than tho«e who did not Kulp found neeU> tests mcreased 
the amount learned m educational «ociolog} bj about J7 per cent Tuniev found 
that educational pnchologj rtudents who took tnehe short tests did about 
20 per cent better than others who took od1> the nud term and final examinations 
Jones found that p«i cholog} rtudenU who took a five-minute test after each lecture 
retained after eight weeks approximatelj twice as much as those who did not Kej s 
found that the «ame tests administered in the form of weeklj rather than monthlj 
examinations in educational pevchologj gave an immediate superiority of 12 per 
cent 


Gicat strides bav e lecenllj been taken at all school lev els — elementary , 
secondary, and college — toward relating tests to educational objectives 
These are set forth in considerable detail by 20 specialists m a highly 
important book, Educational Measurement*^ Trends in the research btera 
ture are summarized frequently by the Renew of Educational Kescareft, 
whose issues contain rather comprehensive bibliographies " 


« pree artiples ilh»trating this poiDt of view are Hans C Gordon, How 
T^Sim Lk Tet Scorn to Lndc^od the Needj ol FopJ- In Grown,, Po.nJ. in 
^ ^,“27’' Eduontional Re-Jirrh As-o- 

JJrS;”'' Teaching Jmma! of Healih aod Fh^mal 

''►JS Ju.eC McMluler Teale and Mea.oremcnt. 
ir 19 ™ ^ Fdomimod Idmnulrol.m end Soporrmoo So 49-5S Janu 

ChaP'o'- I> 

44uW„n i, c ASiencn 

pj^ebolosiral Twtm?. 20 l^^Sniary^ 

CIa^lionaUedP.Utog.Jat^eSSrS Tm 



mbasuremeut m the modern noRLo 


toj dassification of the instnimcnfs of measurement employed in the 

oidinary school for distinctly educational purposes yen in the 


A Oral 
B Written 

1 Informal (nonstandardized) 
a Essay 

b Objective 

2 Formal (standardized) 
a Achievement 

(1) General (survey) 

(2) Specific (diagnostic practice etc ) 
b Intelligence 

(1) General individual and group) 

(2) Specific (aptitude or prognosis) 
c Personality and Interests 


The distinction beUeen the major categories oral and written is ob- 
vious The distinction betw een informal and formal wTitten tests is also 
easy to make A formal test often begins as an informal test Tvbich is later 
subjected to experimental trial and revision only the best items surviving 
the process Formal tests also have carefully worded instructions both for 
administering and scoring and usually norms for interpreting the results 
The distinctions among tests of achievement intelligence personality 
and interests are not so clear cut however By the term achieiment tests is 
meant tests of academic achievement such as arithmetic or algebra they 
are distinguished from most personality and interest tests which are self 
report rather than ability measures™ Intelligence tests theoretically at 
least, are measures of learning capacity whereas achievement tests are 
measures of learning itself ” In other words intelligence tests attempt to 
measure educability while achievement teste attempt to measure educa 
tion The writers have followed the usual practice of recognizing tests of 
achievement and intelligence as co-ordinate with test^ of personality and 
interests Strictly speaking however achievement and intelligence are 
merely aspects of personality which is a terra used by psychologists to 
include every trait that differentiates one individual from another In a 
certain sense then every test is a test of personality and many aspects of 
personality cannot be measured by tesla at all but are cv aluated by means 
”Lee J Cronbacb EssenUals o/ P$]fchol4>gie<d Testing pages 14-15 hew ^ork 


Harper & Brother® 1949 . , , , . . . .k 

« TJton cornea to the jntereatjag conrIu«Ion Chat bi® data are lairh con* tent with 
i concept of a scneral «b,hty to leam and with the idenl.Scation of it a.th the seneja 

itelhgence test John W Tilton T1 e InteieoneliUon, l*taten Mraenre, of Sil ool 

Lcarnme Jourml of ! ‘ychologj So 105 m Jannar, IJoJ Pose 1 J 



,4 Tlir: PROBLEM Of MEASVRLMEXT 

of rating scales, qucstionnam^, inltrciews cowtioUcd obsenatioi., and the 

*'^Te>-ts are ■^bdindod into genenl and *-“pccific on the basis of scope Hiei 
maj al«obe further »iibdmded mtomduidual and group tests on llie basis 
of m'-thod of administration and into \erba! and non\ erbal or performance 
tests on the basis of content A distinction is often made, although nui 
alwaj= ob ened m practice between a test and a scale A test consists of 
a series of questions to be ansnererl or e\er(i<^ of some sort to be done, 
and a pupil’s performance is the number of these he is able to do in the 
tune allotted Stneth ^leaking a scale consults of a senes of ‘specimens 
such as handwriting for example arranged iii order of merit, and the 
pupil’s performance is judged hj comparing it \nth the standard specimem* 
MfKt standardized instruments of mpa«Tircment arc really tests rather than 
scales rhe test items are frequentlv arrange<l in order of difficultj , liow- 
e\ er in which ca~e the term scaled test is «ometimes u‘-ed to distinguish siif h 
ti-ns from tho«e in uhich the items are not so arranged 

iJf course, manj other tj'pes of measurement are employed in school® 
Examples of these are chronological age, height, weight, temperature, and 
tune but these can hardlj be classified as stnctlj educational It cannot 
be too strongl} emphasized that measurement is not limited to tests and 
ejcanunations and certainlj not to standanhzed tests There are also nu 
merous rating scales and check lists for plajgrounds, buddings and equi|»- 
ment and so forth, who«e use is largely restneted to the specialbsl TTiese 
hax e been omitted in the interest of brent> 

It must be recognized that recent tendencies m education lia\ e enlargeil 
Its «cQpe and increased its complexitc , and hac e therebj added to the dif- 
ficultly of teaching and administration But nowhere hat e these difficulties 
been more apparent than in the problem of measurement The need for 
proper e\aluation is as great m the modem school as et er before but the 
difficulties of pronding for it are vastly greater For, as Saucier points out 
an instrument of measurement maj meet all the criteria for measunng a 
reactionarj undemocratic conception of education but at the same time 
be valueless for measunng the major results of a progresene, democratic 
theory of education "* This means that as the schools improve, so must 
the tools and technique of measuTement and evaluation 
A quotation from Scates seems to sum up the central theme of thi:> 
chapter quite apilj 


dv5CTis?ion of modem emle analy-is soe Samuel A. StoufTer ami 
G,„., =, P’S- 30S-3oa Boston 



measurement III THE MODERN WORLD 2o 

IVhat has the measurement movement done for us? One itav of n.sno„rf,„ t 
th.s quesfon ns to note rtat no should not have d we hS^no sish nS2ul?„,l° 
thpr^ ^ ^ f curreat scientiEc work la education For 

^ere cannot be a science without /airlj precise quantification, not that science is 
measurement, but that traits which are devoid of any reasonabiy defimte oualitv 
simply do not have the required specificitj for entenng mto the careful tW .nn' 
es'^entia] to science ^Tien quantities are disregarded almost anj generahz^on 
bewraes true There is an infinitesimal element of truth m practically anything 
which might he said Quantities are part of the nature of truth and are therefore 
an essential part of ficience 


Selected Refehences for huRTHEB Rladjnq 


(References cited in the footnotes to Chapter 1 are not repeated here ) 


Barr, Arvil S , Davis, Robert A , and Johnson, Palmer 0 , Educaftoml Research and 
Appraisal New York J B lAppmtoU CoinTiWkn> , SG2 pages 
Bormg, Edwin G, A Jhslory of Expenmerilal Psyekology (Second Edition) 

York Appleton-Centurj -Crofts, Inc, 1950 Chapter I, “The Rise of Modem 
Science,” and Chapter XXIII, “Gestalt Psj chology ” 

Brubacher, John S (Editor), Eclectic Philosophy of Education New York Prentice 
Hall, Inc , 1951 Chapter II, “Science and Philosophy of Education Compared ” 
Cohen, I Bernard, Science Senant of Man, a Layman's Pnmer for tht Age of 
Science Boston Little, Broim and Company, 194S Part I, “The Nature of the 
Scientific Enterprise ” 

Davis, Frederick B , “The AAF Quabf>ing Examination ” Amy Air Forces At la 
Uon Psychology Program Research Report Ko 6 IVashingCon, D CHS Govern 
ment Printing OiBce, 1947 266 pages 


Dewey, John, Logic, The Theory of Intfuiry New York Henry Holt A Compauv , 
1938 Chapter XI, “The Fufittron oi Propositions of Quantity ra Judgment ” 
and Part IV, “The Logic of Scientific Method ’ 

Einstem, Albert, Out 0 / J/y Later Teore New york Philosophical Librarj 1950 
282 pages 

Gulliksen, Harold, Theory of Mental Tests New York John Wiley A Sons, Inc. 


1950 486 pages 

Jahoda, Mane, Deutsch Morton, and Cook, Stuart W (Editors), Research Melh^i 
in Social Relations, mth Especial Reference to Prejudice New \ork The Drjdeti 
Press, 1951 Part I, “Basic Processes,” and Part 11, “Selected Techniques 
Kemble, Edwin C, “Realit}, Measurement, and the State of the Sjsfem in Quaii 
turn Mechanics,” Philosophy of Sctence, 18 273-209, October, 19 j 1 At the 
heart of modern expenmental science is the controlled expenment m which ob- 
jectinty is secured hj reducing all measurement to the readmg of scales of one 
sort and another Good experimentation, however, requires more tlian^e ^ 
placement of qualitative subjective obrervations bv pointer readings It 
Careful analjsig of all possible perturbing factors and repealed measurements to 
test the scatter of the results” (page 274) 
forge Irving “The Fundamental Nature of Measurement,” M ' ^ 

E^r Lindquist (Editor) Fdumlumal Measurement ^\ashington, D C 4mrr 
con Council ou Education, 1951 



26 


THE PROBLEM OF MEASUREMENT 


Planck, jSIax, Saenlifit Axiiolnvgrapky, anti Other Papers New York Philosophical 
Library, 1949 192 p iges 

Ruby, I lonel, LoiTic, an /niroduchon New York J B Lippincott Company , 1950 
Part III, ‘ The Logic of 1 ruth Scientific Methodologj ” 

Russell, Bertrand, 4 Htslory o/TPestem PAtlosopAy New York Simon &. Schustei, 
1945 895 pages 

Sarton, George, The Life of Science Essays in the History of Civilization Neu York 
Henry Schuman, 1948 Part I, “The Spread of Understanding " 

Stevens, S Smith, "Mathernatica, Measurement, and Ps 3 choph 3 Sics,” Chapter I 
m S Smith Steiena (Lditor), HandbooL of Experimental Psychology New York 
John Wile} &. Sons, 1951 

Whitney, Frederick L, The Elements of Research (Third Edition) Ne« York 
Prentice-Hall, Inc , 1950 Chapter I, ‘ Reflective Thinking, Science, and Re- 
search ” 

Willum*!, Donald, The Ground of Indudxon Cambridge, Mass Harvard Lniversity 
Press, 1947 Chapter V, “The Logic of Science 



2 


The Historical Development of 
Measurement in Education 


A Introduction 

Teste &nd weasureateDts of one kind oranotkef have placed a far more 
prominent role in hum&n histoiy than is generally recognized Nor has 
their use by any means been confined to the schools In fact among Hit 
earliest records of the use of various testing devices are those found in the 
Bible, although they generally have no direct reference to education One 
ilhistratiOQ ^ will suffice 

And the Gileadites took the passages of Jordan before the Ephraimites and it 
nas 80 that nhen those Ephraimites ishich tieie escaped said Let me go o\er 
that the men of Gilead said unto him Art thou an Fphraunite^ If he said Na> 
then said they unto him fcay now Sfiihbofeth and he said Sibholeth for he cou/ / 
not frame to pronounce it nght ITien thej took him and slew him at the passage 
of Jordan and there fell at that tune of the Ephraimites forty and two thousan I 

Attention is called to the /act that here is indeed a "finHl cvainmatiori 
and in a field other than eilucation Doubtless measurement experts of the 
present tune would point out that, m spite of a rather high degree of objec 
tivity, there were certain dubious features it was oral, it was \er> short 
and the mortality late was excessively high^ 

A sociologist* attributes the remarkable stability of the Chinese civiliza 
tion, the oldest culture of any modern nation to five factors one of which 
IS her highly organized examination system It began informally m 225 n c 
» Judges 12 5-6 (King James VeHon) 

* Paul F Cres»ey The Influence of the Zjterary ExaminAtion S«tem on the lJe\Ti 
opnient of Chine«^ CmJirafwn Amenean Journal o/ Soaohffi/ 3o 2oO-262 Srpfem 
ber 1920 


27 



OS THE PROBLni OF MEASUREMEi^^l 

and became a debuito ci\il sen .cc erammation Si stem m 29 b c The 
tem, de~cnbed as bemg thoroughly democratic, ruthles, loanable, and 
orthodox, has bad profound effects, some good and some bad, not onli upon 
the educational sjstem o! China, but also upon her uhole 
On tte one hand, it has presen ed unity by keeping uniform thr<m^ou 
the empire the wntten language, literature, and traditions of the Chinese 
nation and has helped to maintain political stabihtj by keeping open to 
ei er> citizen the door to prestige and pon er On the other hand, it has often 
produced more graduates than could be given positions, has offered little 
assurance that the successful candidates possessed the qualities necessarj 
for good officials, has sometimes resulted in corruption m the conduct of the 
examinations, and has m some degree stifled progress 
Some kmd of measurement or evaluation seems inevntable in education 
It ceems inherently an essential part of the teaching process * The situation 
has been well expressed as follows * 

As far back as we have anj record of «chool routines, teachers have alwaj'S 
examuxedor tested, aa well as taught But our attitudes towards the*^ two functions 
have been, hi-toncallv , quite different We have long understood that teaching is 
a higWv ^killed bunnes^, a profession calling for special aptitudes and extensive 
preparation, and that both its techniquea and its objectives are worth> of the most 
careful mve^gatioa. But examining or testing we have taken for granted as some- 
thing that aavbodv could do an) tune, quite casuall>, for an> purpose he might 
happen to think of It is onh }esterda> that it ov.cun^ to most of us that there 
might be «kiUed techniques of testing or the uses we were accustomed to make 

of our tests and examinations might be open to question 
Every teacher or administrator of more than twenty > ears’ service will recall 
with me that Age of Innocence when a ‘ te<t” regularly consisted of ten questions, 
•oraetimes concocted impromptu as we wrote them on the blackboard, each 
weighted, bj oui arbitnuy perconal Sat, with a value of 10 on a ’»cale of 100, and 
when the perfect!) eunple piirpo«!e of any “test” was to “pass” or “flunk” the 
testeca knew no qualms in thore da>3 nbout reliabilities or vahdities or com 
parabihtv and the sigma laj as far m the future (for us teachers) as televTsion 
It was onlv after the llorld War that this pnmal innocence was disturbed by 
Vhft vummg-vair. the consaousness of leachere generally, as distinguished from the 
psj chologists—of what many of us still thmk of as “the new tests ” A bewildering 
-enes of strange inventions mtelhgence tests first and then objective achievement 
and aptirade teste and intere-t tests, and per^onalitj inventones and ratmgs 
Nearly an o f them appallmg!) elabont- and alleged to have been mo«t labonously 

,,.1'.° is,”!”."'",!' "l “ uutnirtiVB In 1917 the Sonet Govern 

eVlJ™ rf r”’’ 'V’ * ol eauention ne of government 

oiaew^mtn euiomnor eianiinationjandsehoolnurk. Alter Ktcen tram evuen 

end Wy to tlElsnNttoU “"dS «' I-''"®”''’." 

'rW end Snort, « .! 

and Sooely 50 25-20 Jalj 1 1939 cstmg Rus«ian Students, 

* Max MeContv ExanuiwVjwjs OW and “New Tilin' ir«d>„ »..r4 *l •, > 

K 37o Octolyr, 193 ^ * ‘r U«ci* and Abu«<^ Eli ieali'nal 



DEVEWPMmr OF MEiFmmEifT r.v educahox 09 


The foregoing picturesque statement is accurate enough for "teachers 

generally, as the author intended but is /ar from true of certain oufstand 

mg Jeaders in the profession Horace Mann,‘ for example almost one hun- 
ilred jears ago, had a remarkable conception both of the importance of 
examinations and of the limitations of the forms then m existence Hj« 
penetrating analj sis of the weaknesses of the oral examinations then in 
rogue, and of the supenonty of written examinations, could hardlj be 
improved upon by the modem specialist in measurement Mann shoned 
clearly the points nhere the oral examinations were lacking, in the technical 
language of today in validity, reliability, and usability * 

Another American educator who understood both the value and the 
limitations of examinations was Emerson E White widely knowm as a 
tvnler and school administrator In 1886 he wrote “It may be stated as a 
general fai t that school instruction and study are never much mder or bet- 
ter thin the tests bi which they are measured In the same volume the 
author eniunerates several “special advantages” of the written test ^ 


Tt IS more impartial than the oral test since it gives all the pupils the same tests 
rtfid an equal opportunity to meet them, its results are more tangible and reliable 
n discloses more accurately the comparative prc^re«s of the different pupila in 
ronnation of value to the teacher, it reveals more clearly defects m teaching ami 
studi, and thus assists m thew correction, it emphasizes more distinctly the im 
portance of accuracy and fullness m the expre«sion of knowledge it reveaU more 
fully than the ordinary languige exercise the ability of the pupil to wTite correcth 
when lus attention is directed to the thought or subjectrmatter it is at least an 
equal test of the thought-power oi intelligence of pupils since this result in both 
metliod-? is dependent upon the nature of the tests, and lastly the certainty of 
the coining written test affords a hcaltin stimulus to pupils increasing their atten- 
tn II to instiuction, and their efforts to master the subjects taught 

These views of Maun and ‘WTutc appear surprisingly modern and show 
how far the practice of the rank and file is likely to fall behind the theon 
of the pioneer thinker It is doubtful if any single sentence in recent educ i- 
tioTial literature states the supenonty of the written ov er the oral examina- 
tion more completely or more forcefully than the one just quoted from 


‘Oti^W CHldwell and 'ituarf \ Courfr* Then and Nmr in Fdueaiion 
57-41 Vimker- World Hook Company 1923 

■'3hc>p(<rm«wi/lbcoxp/aKKvii«CTi tj>^4 ,ie vwvort tmcrieni Hock 

M mcr»on 1 Wliitt 3 c/fWacosw, page U'! Newlork Amcncui IK>cK 

' "Xr 197-19S However tn «n imu-ual art* .ie.l.ns ^ilh » <emw.bj Tt.r^ 
rduenUon 30 30' -’17 Junirj 



the problem of measvrement 

White In fact, manj modem specialists in measurement nould probably 
accept the abose indictment of oral tests tn tola Thej would of coui^, 
wish to discount somewhat the salncs so cnthusiabticallj proclaimed for 
ordmary written esaminations, and would point out that manj of the limi- 
tations of the oral tests so forcefully stated also hold m some degree for the 
wntten tests, and m addition that the latter has e some special limitations 
of their own not then tecogmzed But that is another story to be told later 


B. TheHi^torj of InleUigcncc Tests 

In Jevons' The Principles of Science, published m 1874, occurred this 
Significant statement ® 

As physical science adi ances, it becomes more and more accuratel> quantitatn*e 
Questions of simple logical fact after a time resolve themselves into questions of 
degree, time, distance, or weight Forces hardlj suspected to e^t bj one genera 
bon, are cleailj recognised bj the next, and precisely measured b> the third 
■generabon But one condibon of this rapid advance is the invention of suitable 
mstnimenta of measurement Accordmgly the introduction of a new mstru 
ment often forms an epoch m the history of «cience 


"While the foregoing statement was intended as a historj- of the past 
development of phj steal science, it is also a remarkably accurate prophecy 
of the future dev elopment of measurement in psj chology, which Jev ons 
appears to have foreseen, as indicated by his reference to the “fact that 
matt in his economical, sanitary , intellectual, aesthetic, or moral relations 
may become the subject of exact sciences, the highest and most useful of 
all sciences ’’ “ This statement is all the more remarkable when one con 
siders that it was made five years before Wundt established the first psy- 
chological laboratory and Gallon began publishing his most important 
studies of mdivndual diSerences, whJe both Bmet and Cattell were lads m 
thevr teens, and before either Thorndike or Terman had been bom But it 


was over a quarter of a century before any very definite progress was made 
toward fulfiUmg the prophecy Then there foUowed rapid progress in that 
direction along several lines That story will non bnefiy be told 

Germany and experimental psychology. An important event m the 
history ot psychology was the establishment of the first expenmental labo- 
ratory m psychology by VTilhelm Wundt at Leipzig in 1879 He was, how 
ever, primarily interested m the analyias of consciousness into elements 
m a maimer analogous to that employed m atomic ehemistiy His sole 

TmT a ^ unsympathetic to the problem of indmdual difterices, 
hot he did mauence considerably the course of psj chologj especially the 
vr^thcr German psychologists, such a, Ivraepelm ribi^mf and 


-“tcSlS N„l„rt The 

*• Ibvi page S-SA 



Dr.vEwimmT of mfasurfvent m education 31 

Me^ann, nho introduced many fonns of separate tests, which were bor 
rowed by later investigators in constructing their scales for measunng 

‘^‘’“P'etion test of Ebbinghau? 

tt'as doubtless the most important 

Another important idea, suggested in 1912 by Stem, was that of renie- 
sonting intelligence as the ratio of mental age to chronological age Tins 
concept, for which Stern suggested the term "mental quotient,” was later 
adopted by Terman as the famdiar IQ 

England and statistical methods. The distinctive contribution of 
the English to the measurement of intelligence has been that of statistical 
methods as a tool for the analysis of test results Sir Francis Galton, one of 
the most brilliant and versatile men of the mneteenth centmy, vras the first 
to treat senously the problem of individual differences m psychology, par- 
ticularly in the realm of sensory discnnunation, although Weber, Fechner, 
Helmholtz, and others had given shght attention to it in irhat is often 
termed psychophysics In 1883 Galton outlined a method for studying free 
association by quantitative methods But his most notable contnbution 
IV as in statistical analysis, where he suggested among other tilings a graph- 
ical method of representing correlations " Karl Pearson, a pupil of Galton, 
and Charles E Spearman stiff further advanced the science of statistics 
Spearman developed his well-known two-factor theory of intelligence on 
the basis of statistical analysis CjtiI Burt, who has been a leader in intro- 
ducing and adopting Bmet's work in Great Britain, was m 1913 officially 
appointed school psychologist, possibly the first person in the world to 
occupy that position 

Trance and abnormal psychology. The French hav e long been leaders 
in abnormal psychology Consequently, they approached the problem of 
measunng intelligence from the standpoint of the classification and treat- 
ment of the mentally defective This brings us to the most important name 
in the history of intelligence testing, Alfred Binet 
It would be hard to find a man who better iffustrates Jevons’ descnption 
of the methods of the gemus than Binet Jevons says ” 

It Viould be a complete error to suppose tbst the great discoverer is one who 
seizes at once unerringly upon the truth, or has any specwl meth(^ of divining it 
In all probabihty the errors of the great mmd far exceed in number those of the 
less vigorous one Fertihty of iraagmation and abundance of gue<5«es at truth are 
among the first requisites of discovery 

This reads as if it were designed specifically to desenbe Binet, and jet it 
ippeared twenty yearn before Bmet founded L'Annle and 

thirty years before his first scale for measuring intelligence He first studied 
law, then medicine, and afterward worked m a biological laborator, laiter 

a Bind a Ryam ■ Tninci. Galton-, Statobral CoatnbuLoM ’ SM 
4S 312-316 September 3, 193$ 

w IV Stanley Jevons op ett Book I\ P’S® —I 



32 j HI rROBLmi of measuremoit 

he turned psj chologist, first of the arm-chair variety, and finally 
an experimentalist Furthermore, in an effort to devise a suitable method 
of measuring intelligence he tried out vanous head measurements, physi- 
ognomy graphology, and palmistry, before hitting upon the correct ap- 
proach Binet ne\er seemed to be quite sure nhat he meant by intelli 
gence” iihat he nas trying to measure, for he changed his definitions 
repeatedly It is clear, therefore, that he did not hit “at once unerringly 
upon the truth ” and that he did possess to a marked degree “fertility of 
imagination and abundance of guesses ” Such errors as he made, and they 
were numerous were not the unintelligent ones of blind trial and error 
but rather the intelligent errors of judgment, made by acting upon the 
course which seemed most promising from a survey of the best available 
facts at hand 

It IS doubtless true, as Bonng su^ests 

At clo«e \new the course of science seems disconbniious, all at tmee a “genius 
makes a disroierj or formulates a theory and productive research follow’s on im 
mediately At the greater range of historical pcrspecti\e, the tourse of aoiciice 
«eem8 to be continuous and the ‘ genius’ appears as au opportunist who takes 
ndNantage of the preparation of the tunes 

Opinions wiU differ regarding the appropriateness of the rrord “opportun- 
ist ’ m Bmet’s case, but there can be no doubt that he did take “advantage 
of the preparation of the times ” Both for his ideas and actual test ma- 
terials he drew freely from others notably his fellow countryman, Blin, and 
his contemporaries in Germany Nevertheless, Bmet did bomethmg the 
others had not done, he began where they left oft and continued with a 
definite contribution both to the theory and practice of testin,? On the 
theory side he enlarged the prevailing concept of intelligence, introducing 
such ideas as those of judgment, adaptation, and self-criticism Terman” 
argues that Binet’s outstanding contribution to psychometrics was his 
abandonment of any attempt to measure “wtellecttwl faculties as such ” 
To practice he contributed a technique of wale construction and a finished 
scale conscsting of test situations selected according to predetermined cn- 
tena and standardized The date 1905 is important, therefore, because il 
marked the appearance of the first scale for the measurement of intelli- 
gence, which crude as It was, has served as the pattern for subsi qiieiit tests 
and scales the world over The 1908 revision wa.s a definite improvement, 
and 13 especially notable for the introduction of the mental age comept 
dev^h resulted m the scale of 191 1, the year of Binct's 


pae.1'.2 NowYorl, 

[any 1<M2 «,„« SaJ, Bo-tan liouRhton M.min Com 



DEVELOPMmT OF MEASUREMENT IN EDUCATION 33 

America m.d applied psychology. The scene now shdts to Amenca 
where the outstandmg name k J McKee# CatteJJ, who was a pioneer alone 
many lines and a promoter dI the first rank More than anyone else Cattell 
was rraponsible for giving to Amencan psychology its practical bent for 
With him the practical took precedence o\er the philosophical As early 
as 1885 he began to publish important articles on reaction times and in- 
dividual differences It was CatteU** who m 1890 suggested the term “men- 
tal tests/' which was to become a sort of trade-mark for the whole measure- 
ment movement But Cattell was too close to Wundt's laboratory to escape 
altogether the views of its master Cattell, therefore, just as did Galton, 
confined his tests largely to the simpler mental processes, such as sensory 
discrimination, where individual differences are least, rather than to the 
higher mental processes, where they are greatest In other words, both 
Galton and Cattell attempted to measure mtelhgenee, but with the wrong 
tools Very little attention was given either to reliability or to validity 
Consequently, m 1901, when Wissler'* published his analysis of Cattell's 
tests used with college students, m which he applied for the first tune 
the Pearson correlation technique to test scores, and m which he found 
little more than chance relationship either among the tests themselves or 
between the tests and college w ork, a considerable damper w as thrown o\ cr 
the enthusiasm of American testers which was not lifted till after Bmet had 
published his 1905 scale Nevertheless Cattell’s influence upon measure- 
ment, through both his writing and his students, notably Thorndike, has 
been great 

Goddard was probably the first American psychologist to recognize the 
practical value of Binet’s 1908 and 1911 scales, which he translated and 
with minor adaptations tried out at Vineland ” In 1911 and 1912 Kuhl- 
mann published his revisions of the 1911 Bmet scale, extending it down- 
ward to the age of three months, instead of three years, which nas Bmet's 
lower lunit ** 

It remained for Tennan of Stanford University to pro\ide the first 
thoroughgoing revision, carefully adapted to and standardized for use mth 
Amencan children, normal as well as subnormal Terman's scale, known 
as the Stanford Revision or Stanford-Bmet, appeared in 1910, together 


wj McK CatteH, “lAfental Tests and Mea'urwneDto,” 15 1S90 

“ Clark Wissler ‘ The Correlation of Mental and Physical Teots, Psi/cholofftcal fte- 

"^Ci^TSoddTd'T™; b, Ihe B,.ct 



THE EROBLEil OE MEASUREMENT 


«.th a most complete manual, Tke Meosurmeni o/ Wcihpcnce " This rev.^ 
s.on has been cnticized on the ground that it nas standardized e y 
school children, nhich may result in somenhat of » 

poor academic background, and that it did not produce a ^ 

L” in the distribution of IQ’s, particularly at the higher ages « has aLo 
been criticized on the ground that its norms were based exclusively on the 
children of one state, Califorma, nhieh may not be truly representatlie 
of the United States as a nhole Nevertheless, the Stanford Revision was 
for more than tii o decades, the most mdely used and most highly regardeu 
individual intelligence test in existence In 1937 a thorough revision of the 
Stanlord-Binet appeared “This second revision conected most of the weak- 


nesses of the first revision 

r^o other distinctly American developments, both aiming to make iti- 
tclUgence tests more practical, remain to be discussed These early tests 
had two practical disadvantages which militated against their wide use 
One of these was that the tests were highly verbal, that is, their successful 
administration recjuired that the subject taking the test understand the 
English language The other was that the tests n ere indmdual that vs, 
only one person could be examined at a time Reasonably satisfactory 
solutions came to both of these problems in the year 1917, and this leads 
to an interesting story 

tuieWigcnce tests, cKitdren ot necessity . One cannot but ho im- 
pressed with the curious role of necessity m the development of intelligence 
testing both m Europe and in Amenca Although it may be true, as Thorn- 
dike suggests, that necessity is not the true mother of invention, she is, 
often at least, the stem, relentless stepmother Tw o instances in Europe 
and two in America will authcc to make this clear 

In 1897 Ebbinghaus was appointed on a comitussion to investigate the 
problem of fatigue m the schools of Breslau As there were in existence no 
appropriate tests, Ebbmghaus set about to devise them, the first “com- 
pletum tests” resulted from this endeavor Seven years later the Minister 
of Public Instruction in Paris became concerned about the high percentage 
of failure m the Pans schools and appointed Binet on a commission to 
detemune those who were so mentally unfit as to necessitate instruction 
m special classes Binet, too, found available measunng instruments in 
adequate for the purpose*' Out of this difficulty emerged the 1905 scale 
already re ferred to, the first successful instrument for measuring intelli- 


"""" «“-‘’"‘V>«Wh»™c,401pa3C Bo.lo., 

from an imbecile who was d<«cnbcd by . k fugitive 

of mu-ll.^encc and an attSuon which w« flectme " \Vni “ k 
dii-UficUon without a diflertnce nwing \ enlj , here mdwil waa a 



development of measurement in education 33 

genre according to modem conceptions But the original Binet scales and 
their early revisions both in Europe and America, possessed the tuo limi- 
tations mentioned above, namely, they nere highly linguistic and they 
uere iiidmdual scales .Soon American ingenuity was to offer solutions to 
both problems 


TEST 7 

riVf— htw iraw— tou • 
•AKnCsJ flak—olaK — n 



Figure 1. Test 7 from the .Vrmj Alpha 


Pmtner and Paterson, finding the Stanford-Iimet unsatisfactoiy for deaf 
children, met the first difficulty by assembhng a scries of fifteen tests of 
manipulation or performance, such as the foim hoard “taa^y 

Seguin Hcaly, and others This combination, nhich appeared in 1017, irss 

knmra as the Pintner-Paterson Pcrfonnance Scale That same joar the 
United States found herself m World War I faced mth the urgent ncccssil; 
oUrlining iTarge citizen army ..nth an , nmffie.cn, supph of romnuss.oncd 



36 


THE PROBLEM OF MEASUREMENT 


and noncommissioned officers In thus emergency the American Psycholog- 
ical Association placed its &cr\Mces at the disposal of the War Department 
The existmg indi\ndual intelligence tests were not only entirely unsuited 
for u«e uith illiterate and foreign-speaking recruits, but they were also 



r,s„„2 




much too slow To meet this ti/w^i - 

largclj the as jet unpublished nok o( Ota ^ P^j eliologisls, utilizing 
the first of a long sueces-iou of Toun “>= Alpha « 

The second difficultj of the earlj tesL Si 

tts't can bo administered to a hundred or mo»« soU ed, for a group 

for mcasuniig one formerly required 




cxpcnmental jtage 






DEVELOPMENT OP ilhASf’REMENT IN LUV< AlIOE X! 

It sliould be noted howevot. that the Pmlnei-Paterson I'erformanee 
and the Army Alpha group te,ta each solved but one difhculty at a 

!wh,l tl ‘•■e Army Alpha type are even more 

terba than th, mdmdual tests had been Figure 1 shons a sample page 
from the Army Alpha Ihe early performance scales, on the other hand 
were nonverbal, but could lie administered to only one person at a time’ 
the Army Beta, designed for illiterate and foreign-speaking soldiem, nas 
the first test to combine the group and performance ideas It also appkred 
in 1917 Figure 2 shows the picture completion test of the Army Beta 
Since that tune se\ eral group tests, largely or primarily of the performance 
t\ pe, have been designed specifically for use mth young children just enter- 
ing school There can be little doubt that World War I gave a decided 
impetus to the measurement movement in 4raenca World War U had a 
similar and perhaps ev en more marked effect 
Tests of specific aptitude. AH of the tests so far desenbed have been 
for the measurement of general intelligence There has also been some 
activity m the development of tests of specific intelligence, or capacitj in a 
restricted area, such as music or mechanics, or in a specific school subject, 
such as algebra or Latin America has also had the lead m the development 
of these tests, often called aptitude or prognosis tests One of the earliest 
and in many ways one of the best know n of these tests is the Seashore Test 
of Musical Talent, which appeared in 1915 Three years later appeared the 
Stenquist Test of General Mechanical Ability In 1918, also, Rogers pub- 
lished a test of mathematical ability, which, although hardly an aptitude 
test in the modem sense, introduced the idea w'hich other authors have 
followed up by aptitude tests m the special branches of mathematics, such 
as algebra and geometry A somewhat different type of test on the college 
level IS illustrated by the Iowa Placement Examinations which appeared 
m 1924 A recent and promising type of test, of which there are several 
examples, is that of reading readiness, to be used to determine a child s 
fitness for the work of the first grade There is evidence that the develop- 
ment of the future is likely to be along the line of tests for specific aptitude, 
rather than tests of generalinteihgence, which aim to cover the whole range 
of human capacity at one shot The test maker, as well as the bird hunter 
may aim at too large a target Dunlap argues this side of the case well 
"The more 'general' the intelligence test, the less its value By increasing 
the specificity . . we add to its value Charles Dudley Warner once shot 
a bear by ‘aiming at it generally/ but it is a poor method ” Thurstonee 
attempt to devise tests of what he terms '•primary mental abdities is a 
“«r^r7discuss.on of the Army General aassification T^tacc Staff pr^o^nel 

Brarrh «?M!tion ‘‘The Army General Classification Test, I tycAol^cal liuUtin A. 

JSTrhS December A civilian edition of .he AGCT ^s published h, ix.ene. 

"'STnthVSrp,' ne,r 

Livcright Publishing Corpnntion 1932 



33 the problem OP MEASUREMENT 

move m this direction, although these tests have not yet demonstrated 
marked supenority over other tests m practical schoolroom situations 


C. The History ot Achieieroent Testa 

Progress before 1918. The early history ot things which have been in 
existence a long tune is usually somewhat obscure This is true of achieve- 
ment tests, whose ancient use has already been referred to Not only have 
some kinds of tests been m existence for centuries, but attention has been 
called to the fact that cnticisms of them, both destructive and constructii e, 
ate by no tcteans ne’!\ 

But the actual n ork of improviDg the existing instruments has al^ ays 
lagged far behind the theory, and actual school practice has been furthest 
behind of all In spite of the marked supenority of xvTitten examinations 
over oral, pointed out by Horace Mann in 1845, educators did not forth- 
mth adopt the former or improve the latter Hon ever, as early as 1864 
an English schoolmaster, the Reverend George Fisher,*^ evidently realizing 
the subjectivity of ordinary examinations, proposed a “Scale-Book,” made 
up of “vanous standard specimens arranged m order of merit”” 
But Ayres observes that “Mr Fisher’s efforts seem to have produced no 
lasting results,” for v,hichhe suggested this explanation ” 

Trogress m the scientific study of education was not possible until people could 
lie brought to realize that human behaiior tvm susceptible of quantitative 8tudj» 
and until lhe> had statistical methods with which to carry on their investigations 


Although Ayres felt that Galtou’’* work had largely met these two needs, 
he gave Dr J M Rice the honor of being the "real inventor of the com- 
parative test” m America m 1894 ^ llice had studied in Germany and had 
come under the mfiuence of expenmcntal psychologists both at Jena and 
Leipzig Here again the attitude of the educational leaders was anything 
but cordial, and "for more than ten years but little progress was made 
beyond the work of the pioneer himseJ/ 


« Duane C few A Study of the lleUtionshipe between Thurstone Primary Mental 
Ipnl Jow«wifo/EduEahonal Psychology 40 239-249 

«hoolmeu proved to have the intelli 
"f »A»v™e„t test., see I.eo„xrd P 

*• Leonard P iWf n 

" Leonard P \yn* ibid page 12 



DEVnwPMCKT or MEASUREMENT IN EDUCATION 39 


urement and the father of the movement The latter distinction he 
avrards to Dr Edward L Thomdiie The honor is richly merited for nl 
other person has touched the measurement movement at so many points 
or has contributed so much to it In addition to his very influential pubhca- 
tions on statistical methods m education and bs pioneer work on mtelli 
gence tests for college entrance, either Thorndike or his students were 
responsible for most of the earlj standard tests and scales for measunn«' 
achievement The first lesi was the Stone Arithmetic Test published in 1908^ 
and the first scale was the Thorndike Handwriting Scale announced in 
and published the following >ear The next few years saw the appearance 
of scales and tests in various fields The school survey movement un 
doubtedly added impetus to the measurement movement, as did the 
appearance of certain unportant books and periodicals to be referred to 
later 


Studies in tlie unreliability of school marks and examinations. 
But there was an additional factor wbch served as a veiy strong stimulue 
to standard tests Educators discovered for the first time just kow bad existing 
measurements were Beginning about 1910, several stuies m rapid succes 
Sion made this point convincingly clear A distinction should be made be- 
tween the limitations of school marks and the limitations of school e\ami 
nations The need for reform in college marking was forcibly brought to 
public attention by Afax Meyer, ^ who reported on the marks collected 
from forty instructors for a period of five years at the Umversity of Mis 
souri He found such astonishing variations as 55 per cent of A’b in philos- 
ophy and only 1 per cent in Chemistiy III, wble there were 28 per cent of 
failures in English II and none in Latin I Johnson” found a simUar con- 
dition m the Umversity of Cbcago High School In a two-year period he 
found, for example, that the marks for German showed 27 1 per cent A'a 
and 8 4 per cent F’s, whereas the marks m English showed 6 5 per cent A's 
and 15 5 per cent F’s Such venations both at Cbcago and Missouri could 
be most reasonably interpreted on the supposition, not that Enfflrsk is 
harder than foreign languages but that English instructors are harder 
In other words school marks are bghly subjective, the mark receiv ed often 
being more a function of the personality of the instructor than of the per^ 
formance of Ike student Further studies showed similar results elsewhere 
without exception This w as certaiiilj disturbing if not as Thorndike sug 
vests, actually "scandalous 

But the evidence presented bv a second tjpe of studj was even more 
damaging Variations among the final marks m different departments 


“Max Merer ' The Grading of Studenl* Safnee 23 Augu«t 21 

“Franklin W Johnson ■ A Study of High School Grades School Rmttr 19 1J-- 

0/ A Seen, lor lA, S:,„j of EduroJ.or, Pari ! 


page 2 



40 


THE PROBLEM OF MEASUREMEl^T 

might be accounted for, at least m part, by variations in the background, 
mtelhgence, and application of the students in these departments This, 
at any rate, pronded a comfortable loophole But even thi^ avenue of 
escape was soon to be closed Manifestly , such factors could not be respon- 
sible for differences when several persons were marking the same student’s 
paper, and least of all v, hen the same person marked the same paper on tv\ o 
different occasions But studies in abundance have established both con- 
ditions 

Perhaps the most stnking of the early studies were those of Starch and 
Elbott In one of these studies Starch and Elliott*® userl facsimiles of the 
^ame geometry paper v\luch uere marked by 116 high-school teachers of 
mathematics The values assigned ranged from 28 to 02 Manifestly, if 
high-«chool teachers cannot agree any more closely than that in mathe- 
matics, one of the most objective subjects, the situation indeed bad 
Other studies tended hut to confirm the suspicions One of the most 
spectacular v\a3 made bv Falls** viho had 100 English teachers mark a 
composition by assigning it a percentage value and also mdicatmg the 
school grade in which they would expect that quality of work to be done 

TABLE 1 

The EfimuTzo Gr-oe-Valce avp Pfrcentacb Marks Assigned to an Evgush 
Coupo«rnQN bt One Hcsdreb Teachers (apter Falls) 



fac. tl« co^posmon .St Lest LleZHrM ' 

“iTnVjT* "ork m Math 
MattKlra 42-4 



DEVrWPMBNT OF MEASUREMENT IN EDUCATION 41 

survej committee irt Gary, Indiana, a few years earlier, and was written 
by a high school semor whose special mterest was journalism and who was 
a correspondent for some of the Chicago newspapers It is not unreasoL™ 
to suppose that many of these English teachers will never have as good a 
composition submitted by one of their pupdg or that few of these teachi rs 
could have Tsntten a better one ihemseli'es 
Evidence is a\aiiahle that examiners m other fields sh{n\ variations /uJh 
as startling as those reported in public education Ten examination papers 
nntten by applicants for licenses to practice dentistry in lienfucky were 
submitted for regrading to the regular examiners on the official boards of 
23 other states The results are summarized in Table 2 The papers are 
arranged in rank order from Ion to lugh (l being highest) according to the 
median judgment of the 24 examiners who are designated by the letters 
A to X according to degree of strictness in marking That the variations 
are enormous is indicated by several facts With a minimum passing mark 
of 75, it Mill be noted that eveiy paper was passed by at least four exam 
inera, and failed by at least four other examiners The most liberal e\ 
aminer, A, passed them alf while the two strictest W and X failed them 
all • Seven different papers w ere rated by one or more examiners as the best 
of tho ten while two of these seven papers were rated by other examiners 
as the poorest of the ten Surely such a situation can hardly be regarded as 
anything but chaotic ^ 

But Starch** also presented the problem in a different and still more 
unfavorable light He found that college instructors assigned different 
marks when they regraded their omi papers mtbout knowledge of their 
former marks In a later study Ashbaugh” had 49 Ohio State University 
semors and graduate students, the latter with teaching experience rate a 
seventh grade arithmetic paper on a percentage basis three times at in- 
tervals of four weeks between ratings Some idea of the lack of consi3tenc> 
in scoring can be gamed when it is mentioned that only one student gave 
the same total score on all three trials and only seven gave the same total 
score on any two successive trials The mean differences between pairs of 
scores on successive trials were as follows between the first and second 
trials 8 1 points between the second and third trials 7 3 points In studies 
by the writer using the same anlhmeUc paper he has found variations of 
as much as 27 points on successive tnals the same btortr and as much 
as 10 points variation on values assigned to the fir^t problem on two sue 
cessive trials approximateI> mnety dajs apart 

” For a^fuller account of this investigation see Leon M t-tildcr- Itefwrt of fhi* 
neseLh Committee <m Fxammot.oM f “ 

Aal,malAax,a!,msrDmlal£i«r,,n^ieonW(lAiie,itI9K 

■■ Darnel Starch Reliability and Dolnbolion «f Cra lee yJeimee dS UHMia 
°'*.E"/Ashbangh Kedueng the V ambiblj » T.-ichen M.rl.- deirne. et F* 
mlionol BeirarrJ 9 1RS-19S March Itei 



rttrNT\*roim ExAMiNrna 


42 













DhVrWFUiNJ ot MCAbbamiENT IN EDUCATION 43 


In a «m. ar study Hulten- found that 28 expeneuced W,scousm h.ah 
school English teachers differed nudelj on trials at an interval of too 
months m the t nines assigned an Engl^h composition nhich they thought 
nas nritten by an eighth grade pupil but nhich nas really pit of the 
Hudelson Scale at the timenen and unfamiliar He found that 15 teachers 
who gave passing marks the first time iiould have failed the pupil the 
second tune the paper was marked and that II teachers who gave failing 
marks the first time would have passed the pupd the second tune Studies 
involving English composition are especially significant because every es 
say examination is a senes of compositions and when English teachers 
who presumably have more than ordinary skill m this field can agree neither 
with other teachers nor with themselves a second tune the situation is verj 
serious “ 


In February, 1918 Thorndike published nhat has proved to be probat)l3 
the most influential paper that has ever appeared on educational measure- 
ments It began with the well Known dictum ^Vhatever exists at all exists 
in some amount,” and ended with this note of satisfaction Of the gai/w 
made in the past decade, we may well be proud ” As he looked into the 
future, Thorndike saw it conditioned by a senes of t/ s ** 


If those who object to quantitative thmfciog in education will set themselves to 
work to understand it if those who criticise its oresuppositions and methods will 
do actual experimental work to improve Us general logic and detailed procedure 
if those who are now at work in devnsmg and m using means of measurement uill 
continue their work the next decade wiU bnng sure gains in both theory an I 
practice 


We shall now take a look at what really happened in the years follow aig 
Ihorndike’s statement of possible achievements 
Progress since 1918 According to Buckingham," it was in 1019 that 
“test-making passed from an amateur to a professional basis ” A good sum 
mary of the next decade has been made by Monroe “ to which reference 
has already been made The monograph begins mth the as'^uranco that tJio 
pioneer «tate of educational research is passed and that ‘ quantity prodiic 
tion” has been achieved And, as is to be expected much of the output is 


« C E Ilulten The Personal Element in Teachers Marks Journal of Educahonal 
Research 12 49 55 June 1925 . i , 

« Ivor IS the situation peculiar to Amenca Studies reported m Euroffr revcaJ cot d 
tions fully as bad In England for example examiners were found to rhanp their 
judsraents i-onsiderably when they were nelted to raarkn^in Ihe ■amc piper, U,ey toil 
Rcored a vear before Sec School and Sociefj/ 44 3G4 September 10 1930 
« PdwJrd L Thorndike The Nature Parpo^ and General Methods of 
menta of Educational Pro lucts Snenteenlh Yearboi^ of Ihe S<Uto mI 8^0 M 
mema oi i u Quoted by ixmitsion of tin S<x-ietv 

"nr Dr.rfL,ty SveVre ./ the Anl.tonl 

‘^‘^'^vXr'r"tonToo‘and lenre «/ filimlrennl torerrh fats r 3.1 

« Uitonv Un.ver.it, oni.now 

l9-5> 



the problem of measurement 


not up to the bghest quabty, when judged bj modern standards ^^onToe, 
however, detects some e\ndeiice oi a growuig conviction that the emphasis 
should be upon quahtj of work rather than upon mere quantity 
■Nloreoier, by 1927 there were alreadj developments m new directions 
which represented a distinct advance The earlier standard tests of achiev e- 
ment were largel} of the general or suney tjpe, which afforded a general 
all around measure of the pupil's attainment in the subject, but which did 
not giv e the detailed information required for remedial work The next 
decade saw the dev elopment of v anous achiev ement tests of a specific tj pe 
For example, there appeared m several fields duignoshe tests, whose func- 
tion was to give specific information regarding the pupil’s strong and weak 
pomls Also praciics tests were developed, especiall> m arithmetic, whose 
pniaary function was not so much measurement as drill Another important 
dev elopment of this period was the orgamzation of tests into batteries made 
up of survey tests in the more important subjects, all pubhshed in a single 
booklet In 1920, two such battene® appeared, one bj Pintner and the 
other b> Monroe and Buckingham Two years later appeared the first 
edition of the well known Stanford Achievement Test, which, with suc- 
cessive revisions, has continued to «et a high standard 
There was also a rapid development of high school tests in the major 
academic subjects Even today, however, measurement m high school can 
hardly be said to hav e kept pace witb that in the elementary school There 
has al«o been «orae activntv , but less marked, in the development of achiev e- 
ment tests on the college level 

There still remained at the end of the hrst decade of standardized test* 


an unportant need that had not been met Confidence m the ordinary 
school examination had been «enouslj undermined by such studies as those 
to which reference has already been made, and as yet no smtable substitute 
had been found Al«o, there were many fields, especially in high school and 
college where theie were hardly an\ standard testa Even m the subjects 
mon fullv provided with such tests, they were by no means adequate to 
supplv the needs of the classroom teacher Furthermore, standard tests 
tepre«ented a considerable item of expense w hich school boards at that time 
were often reluctant to assume The scncalled olyectnc, or nevyUpe, test was 
dev^»cd to meet just this need McCaIl» seems to have been the first to 
an adaptation by the clas'^room 
teacher of the form of the test items used m the standard test Such te^'f' 

IT. TfT xvere not standardized Soon they 

were wideh and often uncritically used ^ 

ha3 men a bnef aumma^ of the measurement mot ement tor 


I o-e-th r M ir'januao ool EesminsO o /jv-nal c/ Eiu'dhet 



la.VLLOPiIhNT Of VhASUREMENT IN hDUCATION 43 


tlie quarter of a century beginning m 1920 It was during these eventful 
aSood from early adolescence to early 

Improved evammations, children of necessity. Attention was called 
earlier in the chapter to the role of necessity in the development of intelli- 
gence tests Much the same influence is evident in the development of 
improved measurement of achievement The origin of the objective test 
referred to above is a case in point Three other instances will be cited 
briefly 


It was customary m the early dajs for the school covDimtiees in iMas 
saclmsetts to give oral examinations m the schools under their control 
IS 15 the enrollments had become so large m Boston that the committee 
could no longer devote the time required for anything more t har; the most 
casual examination of each pupil with an oral quiz To meet this situation 
the uniform written examination was adopted The results were so grati 
fjing that Horace Afaim wrote the enthusiastic defense of wntfeu exami 
nations to w hich reference has already been made 
In the latter part of the last centuiy considerable pressure was I eing 
brought to bear from the outside upon the schools to make place for certain 
new and practical suhjei ts such as manual training and home economics 
But the school men opposed the move on the ground that there was hardl} 
time to teach the suiijeots already in fhc cnrnrnlum Tiicn, in 1894 
Dr J M Jlice had what he called an ‘ inspir ition ’ Jlc says 


In truth liowev»*r I came to recognize that tins (the claims of school mfn follon 
ing different courses of 6tiid>l was all talk — that no one really knew the facts 
because there were no standards to serve as guides Then one day the idea faslie 1 
through my mind that the way to settle the question was to trv it nut Tor a 
beginning I decided to take spelling and on that veiy day I made up a het of 
50 words with the view of giving them as a t®.t to the pupils of the schools as I went 
on my tour from tow n to town I have no record of the date of the inspiration but 
I tlimk it was some tune m October, 1894 


^Uwjgaocicurv wdimbjdxirJidji.mDce- 

ment that not only transformed the teaclunR ol pellmg but brought to the 
fore a new terhnique for settling educaliMial issues 

The schools of every period have apparently had to meet the criticism 
that they ate not so efficient as those "in the good old daj s " Usually again 
there is no defense except argument based upon mere opinions The cnti 
asm was especially severe ni the earh years of the present centup Just at 
this time in 1900 a fortunate event oli lined which taught educators a 
M , ond lesson in the value of eompanidve cxiuniuations John L. liilo of 
Springfield Massachusetts discovered in an olil nttie a set of cxaniiiiatmns 
which had been given m the bpringheld sihools ... the year 1810 The 
thought occurred to him to give these same examinations to the pupils 
'■.(loolTdbyLcinarlr tire ea nl p.p, 11 OaCed bj of the Oocatj 



46 


THE PKOBLhil OF MEASUREMENT 


the same citj in 1900, just sixty yeais later ” In spite of the changes in the 
content of the subjects, he found the results distinctly faxorable to the 
later schools In mnth grade spelluig, for example, the pupils m 1846 had 
ax eraged 40 6 per cent, vrhile the average was 51 2 per cent in 1906 In like 
manner, the geography axerage had risen from 403 per cent in 1846 to 
o3 4 per cent in 1906 But the greatest superionty ix as in the case of anth- 
metic, where the increase xias from an average of 29 4 per cent in 1846 to 
6i 2 per cent in 1900 It was exndent, therefore, that the facts xvere the most 
effectixe tools xnth which to meet criticism, and that comparative exami- 
nations xxere very useful in supplymg these pertinent facts <’ 

D T^e History ot Character, Personality, and Interest 
Measurement 


^ “ P^bably true that human beings began to 

Ectef^nT « *“ evaluation of each other’s 

from the sTand'^ T ! history But 

rom the standpomt of measurement, these eariv efforts were both im- 

werH^Med somewhat later these methods 

&a”trcfsurumsre„,'^ "T “-aa chance 

upon a credulous nublic » * ^ which have exerted wide influence 

olegj graphology, palmistry, and phren- 

characL, the «iemSudy*t?tto Odd ?s 

it be said that the earlier Dseudo-af>mn* fi “ew Nor can 

fluenee the popular ramd ollton n.n 

of measurement More than fio \ ^ man> other aspects 

the character which sWs fur ^ the conclusion that 

thing’ and that it is therefore ^ 'Jefimte and durable ‘some- 

He proposed ratmg scales with stalistiral « measure it ”” 

termed ■ rude experiments'’ suggested manv 1 results, and what he 
doubt Galton’s ingenious suggestions markM "'vestigations Without 
increment of character ^ Pvgreomg of the scientific 

personality ard''rh!.Tac"er were measurement of 

Jl' J'*" H. TheHoHeaPalen. 

•“xrver »t>e lay public but 

‘^n.rter 



development of measurement in education ni 

» ifw! f “ental hygiene, and character educa 

unon theTh lTV''™®'” that success along these Imes was condUioned 
pon the ability to measure other things besides general mfeJJigence and 

one^oTth ™‘h to character education, for example 

one of the leaders m that field, Lentz, said “Character education without 
Character measurement would appear to be as logical as target practice 
in the dark, good shots and poor ones bemg equally gratifying "" 

The first attempt to measure character by a test ^as probablj that of 
emald in 1912, but the author’s claims for the test uere very modest ” 
Voelker, m 1921, devised some actual test situations for measunng char- 
acter By far the most ambitious attempt so far made is that of the Char- 
acter Education Inquiry,** under the direction of Hartshorne and May, 
vhich extended over the five-year penod, 1924-1929 These workers sub' 
jected all the promiamg tools then m existence to ngid trial and devised 
many new and ingemous techniques of their own Their mam effort was 
directed at selecting representative and vaned life situations which would 
afford a valid index of the totality of the character of the indindual 
Most of these methods, however, had had interesting histoncal ante 
cedents For example, the celebrated physician Galen who in-ed m the 
second century, employed methods which were not unlike those in current 
use On one occasion GaJen, employed by the emperor to find out whether 
the empress was in love with a certain courtier, attempted to do so b\ 
havmg the suspected lover appear in the presence of the empress wh/fe the 
Dhysici&D felt her pulse to determine the change m heart beat' It will be 
noted that Galen’s technique suggests modem physiological methods of 
blood pressure, psycho-galvanometer, and the like, as well as the ‘ earn 
phng” method of the performance tests 

It IS doubtful whether, as a rule, tests of actual performance ha\ e sufii 
ciently demonstrated their superior validity and reliability as measures of 
character and personality to justify the additional expense and mcon\ en 


SI Some idea can be had of the extensive literature on the sub/ect from examining the 
annual volumes of the Remew of Educational Research A wlMted biography oc char 

acter including 282 titles selected from a complete Jist of about 1 (W titles appra^ 

in 1932 A supplementary bibliography for the next thrMy^rs which appeared in 1 
included over 400 Mies AaahgouschaptertiOesjn tbe lOSS raucational and 
logical Testing issue are Development and Applications of iNonprojec ive o 

Personality and Interest and ‘ Development and Applications of Projec ire Te^ s ol 
PeJ^onahty Character measurement per s' receives very little 

eve7 see Vernon Jones ‘ Character Development m Childrw-An Objeclii e Approach 
OiapS 14 m Leonard Carmichael (Editor) Slantuil of Child Psychology New )ork 

^“S^rSoToi^'Lent!? Jr An ExpLmenlal Method for the DiscoieryandDerejop>^t 
yf 0/ Character page 2 New York Bureau of Fublieatiocs Teachers College 

-olumb^ Umre^it) Delinquent Class Ddrercntiating Tests Amenco. 

A“eom^S”report wa^pubIlS bj The Macmtllan Company m three 



48 


THn PROBLEM 01 MEASUREMENT 


lence m\ohed m their administration, except for purposes of research 
In his summary Goodvsm Watson Rajs “It is probable that tlie last five 
years have brought some swing of psv< hological interest away from 
personalitj-test techniques and toward more emphasis upon the study of 
personalitj through ratings aneidotal records, observation of behavior 
and case studies In the final chapter of SjTnonds’ comprehensive and 
anaMical book, Diagnosing Personaltly and Conduct, the author make^ 
this statement ‘Trobablj the greatest usefulness will be found in ratings 
the questionnaire, and the interview for obtaining evidence as to adjust 
ments toward the environment, personal evaluation, attitudes toward re- 
a ity, sexual relationships, morals, and feelings The same author note*, 
c ear evidence of a lag both in chmcal research and in the tran-^l ition of 
tins researeh mto educational practice He says ” 


acheTLlni'w’''' from 1910 to 1020 on mental and 

Basic wSn the assimlaW educational practice m the 1920 s 

tion m the schoSs „?the ' assmnlalcd m the practice of clue. 


^Uo?relv1"btercans1 nuesfonnairea, and intarviCMS 

questionnaires are only deTOS^firre^ori 
rather than true tneasur^iT^eTt^^ 

IhatTcnUonTor ■" “ 

the tune of the appearance of P^W'^ed in 1883 About 

a scale for judging intelligence One of the moTr proposed 

for mca^unng traits of npr«nnt»i.*, au ® famous scales apecificallj 
duced and extensivelj used dunng'worlH ' to* Man-lo-Man Scale, intro- 
Shornc and May to 4.or7sorwLTSl ^ «='«- 

p __Apph,„„ f'- vort 

"IhrccM SvmonU, Tn, ff Appleton Century Com 

SnT'cl'lLIrT" «-''>■ " ^--1 0/ r^ucnlionnl tto 

' -A '"—-.Chapten ^ce 

- -...hon. oner. .„„.kee ,„u„„„o„ 



development or measurement in EDVCATIO'i 40 


The questionnaire. The mt ention of the much-maligned quesl.omia.re 
s often ascribed to that versatile Englishman, Sir Francis Gallon But that 
he inatrument was in enstenoe at an earlier period England i„ fact 
if not in name, is evident from the folloning critical statement 


It IS impo5,<tibIe to expect accuracy in returns obtained b\ 
constructions being put upon the wme question by different 
consequentlj cla«sif\ their replies upon \anous principles 


circulars %arious 
individuals who 


But Galton undoulitedl^ miproved and used extensiveli the questionnaire 
v\hich was imported to America about 1880 by G Stanley HalJ The m- 
stnimeut exists in many forms today, but its principal use in education is 
for measunng adjustment, attitude, and interest 
The Woodworth Personal Data Sheet began m 1917 as a method of 
measunng the alnlity of soldiers to adjust themselves to the trying condi- 
tions of array life In 1923 Mathews adapted it for school use The same 
3 ear Gaily pubbshed another revision which has been widel} used with 
children in their teens Two 3 ears later Laird made an adaptation for col- 
lege use and provided a graphic rating scale for scoring In 1919 Pressey 
published his widely used X-0 Test, a sort of questionnaire covering a 
miscellaneous assortment of items having to do with emotionality The 
questionnaire has also been used to measure other types of adjustment, 
such as introversion-extroversion by Marston and others, and ascendance- 
siibnusbion by Allport 

Ihp development of wholesome attitudes has been recognized m recent 
years as an important objective of education Since about 1920 much at- 
tention has been devoted to the measurement of attitudes of vanous Linds 
Hart’s test of social attitudes and interests which appeared m 1923 and 
Watson’s measurement of fairmindedness which appeared in 1925 are good 
examples Beginning in 1928 Thurstone has been responsible for important 
uiiprovements in the units of measurement enipIo3ed in attitude question 
naires on many subjects By having his questions scaled by a group of 
judges, Thurstone has found it possible to secure satisfactory results with 
fewer items Figure 3 shows an adaptation of this weighting technique 
applied to a scale for measunng a pupil’s attitude toward high school” 


necessity and invention lor a while it seemed that rating scalw as scientific 
instrument* would be completely discarded It was necessity that saved the daj While 
everyone talked about the superiority of objccUve tests yet it wa» won found that 
many qualities of character yield only stubbornly and CTpensiveJy to ob/eclive leMing 
ff character and personahtj studies «efe to conUnue ratings had to be revived In »; itc 
fall their difficulties snares delusions and pitfalls th^ are now sta^g a cr n id-r 
Me'comehack’ •’TkeJovmalofSocialP^olm I ^ ‘, 1 ^ hv tPsIter 

•o Quoted from Journal of the Staltetteal Soaeti, of London 
3 MonroeandMaxD Tng^\\iaxt,The SaenUficStudyof Educatwnai PrMems 
^few York The Macmillan Company, 1936 r> -n i .nrt F H 

n For a discussion of this scale see II II Bemm^ O C , 

5ille!|..e, "Weasunne AltituJc Ton-trJ the li.eh School, Jmrmt of 
’■Iducahon 2 66-64 September, 1933 



50 


THE PROBLEM 0^ MEASUREMENT 


Tn this case the scale values range from 6 for Item 1 to 10 1 for Item 22 
The pupil’s score is the median scale \alue of the items checked 
The close relationship of interest to guidance, \\ hether educational, voca- 
tional or personal, has stimulated considerable activity directed toward 
its measurement In 1907 G Stanley Hall published a questionnaire study 
of the recreational interests of children, which everted a wide influence 


HIGH SCHOOL ATTITUDE SC\LE* 

Form B 

Below IS a list of twenty fi\e statements about school Place a check mark 
before each statement with which you agree and leave unmarked those with 
which you disagree This test will m no way affect jour standing in school 

1 School is like a prison 

— — -2 I have a lot of fun m school 

— .3 School IS all right 

A Mj teachers alwajs treat me fairly 

5 I like to go to school to be with other peoTiV 

- 6 Many of our great men have no high school ulaiMwn 

- 7 i hate most school work 
-& Some things about high school arc all ri ht 

9 High school iraining develops personality 

12 is ’^r"' E" ‘o 

•" m r. com 

™ ’-e’' for Ihc benofit 
'“on 

school ^ mnllv oanlR to know m hiji 


~ 17 

— 18 
_ 10 
JO 


It one hos plenty money ,t may he ell ngl.t to go u, Ingl, 

education ^ know don t have a i igh school 

high sthools * nation if jt not for our 

21 Hmli2h£l' tmmSo‘JS’'”'l''‘^ 

High echo. U dcicl^i lojnujr" 


bwvcKljK 

La develop lo^uUy 



development of measurement in education si 


This Pioneer study has been Mowed by many others, of which the most 
extensive is probably that of Lehman and Witty, published in WM liZe 
appears to have been the first to use this technique for the measurement 
of vocational interests in 1921 Too other studies, employing somevvh™ 
different techniques, appeared in 1926 One of these, by Miner offered 
paired compansons, and the other, by Cowdry, employed a complicated 
scheme for weighting the scores The best known of 'll! are the Stromr 
Vocationai Interest Blank, which appeared m 1927, and the Kuder Prefer- 
ence Record, which followed m 1939 


Paper-and pencil personality inventories have been se^ereIy cnticized 
in recent years, especially by a prominent clinical psychologist, Albert 
Ellis He speaks out against the more elaborately scored anrl tediously 
mterpreted instruments of this kind “ 


since personality in\ entones depend m the last analj sis on printed ques- 
tions, and since virtually no one claims for them the ad^'antage of getting at un- 
conscious and semi-conscious material which effective clinical uee of interview and 
projective methods are generally conceded to some extent to uncover it is diAcuJt 
to see why clinicians should spend considerable time first mastering and then using 
these mventones 


The interview. One of the oldest forms of obtaining knowledge is the 
personal interview It has always been an important tool in the hands of 
certain professional men, such as lawyers, doctors, and newspaper re- 
porters, but it is used by everybody to some extent Its chief value in 
education is prob ibly in diagnosis and guidance The intervnew is used 
to supplement the ordinary objective evidence about a pupil, such as is 
afforded by his school record and by a firsthand knowledge of such things 
as his feelings and point of view 'I he evidence indicates that the personal 
qualities of the interviewer are fully as unportant as the technique em- 
ployed 

Recent developments. Three promising developments relating to meas- 
urement require brief mention The first of these is the attempt to subject 
personality to elaborate statistical wialysis Leaders in this movement hav e 


” In ‘ The Validity of Personality Qucslionnaires ” Ptycholopcal DuUeltn 43 1S.v- 
440 September, 1040 Albert Elfis and Herbert b Conrad areBomeirhat -nore/avoraWe 
to them in ‘ The Validity of Per«onahty Inventonca m Military Practice,' Psychological 
BuUehn 45 385-426 September 1948 , 

»> Albert Ellis ' Recent Research with Pcr«^nality Questionnaire- Journof of Con 
,ull,ng Psychology 17 4ii-49, February, IM P.E«18 Quoted mth pem^.on otthe 
Journal of Coamthag Piychohgy nid the Amennui Puycholowral A~ocul»n Pot u 
rebuttal see Allen Calvin and James McConnell, • Elto on Personality Inventories, 
Journal qf PonsulUng Psychology 17 462-464 Deeunber, 19o3 
riiomdike has described the Stanford Bmet as an »Pl’™f>^ 
etandird.red interview ' Ths Measurement of I I hew lork Bureau 
of Puhlicahons Tearhem College Columbia Universitj, 1927 neutwl. 

V aluahio suggestions on the interv.ew ate found in Mane Jahorla Morton DeuM. 
and Stuart M Cook Research Methods «s Snant RMmas Port. / oad U. Chapter. 
\I \rr and \rrr ^ew \orlc The Ery den PfP-9 19.>1 



52 


THE PROBLEM OF MEASUREMENT 


been Spearman, Kellej , Thurstone, Flanagan, R B Cattell, Ej senck, and 
Stephenson The second de\ elopment is that of controlled observation or 
time-samphng, ^hich is employed mainly in the child study laboratories 
A third development has been in the direction of measuring public opinion •* 

E Some Important Publications 


From the beguming professional journals, books, and other publications 
ha\e exerted a profound influence upon experimental psychology and the 
testmg movement Only the most important can be mentioned here 
Professional journals The \alue of professional journals is that they 
keep the professional ivorker in one area continuously informed of what 
IS gomg on m his own area and elsewhere The first psychological journal 
was Mit^, founded in England by Bam in 1876 For eleven years it re- 
® ^ psychological journal m the English language, and so was 

the vehcle for most of the important psj chological articles both m England 
the the most important articles on measurement dunng 

Mer,^meT'™l'’n ^ ^ “Mental Tests and 

co^rr bv r “PPf '«> “ 1890 nnd contained some significant 

jiS jp r P^y^l'o'ng'nnl Journal m America wan the 
ZhZ rr ^ r Stanley Haumissr It has 

doubtless none^mo^^m''^rf^‘i''fn measurement and statistics, but 
ObieetiielyDelernunedanTMeLJlS”ub"h°"’* "General Intelligence 
the original formulation of k. „ nppeared m 1904 This was 

ligence^and “rn. of i'''” “‘''- 

fluential m directme the attention of 77' ’’*1 tljoloa , which was m 
HaU also founded Pedagogical I™” 1° 

Psychology), nhich may be regarded as Gmeiic 

psjchologj m 1891 Three rears later journal of educational 

Siepie whichnastobetheunneinnl *7' started L'Annee Psycholo- 
ou the measurement of mtellieenerT°ii ” *’™®ng his extensive w ork 
^^-^^CoUegePecoZZ^^^Sd^'’^^ “‘‘“‘■on of the world The 
important studies, and those of '“nny of Thorndike s 

lolume of the Jmirrua of Pd„J,, “d co-workers The first 

C,,.„,rr, f ■“ ‘9“>. ooo 

' ^ Princeton nn, 

Herbert S ^ Cre«p. Opinion 

Atlitu Je AtUtud^\lMi,L November liM6 

»ponvi to Cn-pi. -7(^5S9 November ^ *° Opinion 

MclhodoJogy, 41 m }7?^’'itf"'* Conrada Reply to '^•cNemar Re- 

' ”1-176. March, 1917 1° Appmual of Opinion Altitude 



nEVELOPMENT OF MEASVltEUENT IN EDUCATION M 

tained an aiticle l;y Huey on “The Bmet Scale for Measuring Intelligence 
and Retardation Tn o other journals, School and Socieln, and EduJional 
Admrmslraiwn andSupermm,. both founded in 1915, have included many 
reports on the use of tests both for research purposes and for the actii^ 
n ork of instruction and school administration But probably no journal has 
of Educational Research, started 
m 1J20 From the very first issue, which contained McCall’s article on 
A ^e\v Kind of School nxaminalion ” to the present time, it has exerted 
a nidp infint-ncc upon the measurement movement 
Px^/chomet^^ka, nhirh began publication m 1936 aj, “a journal devoted 
to the development of psychology as a quantitative rational science,’' car- 
ries many articles applying mathematical procedures to measurement prob- 
lems On a less mathematical level is Educational and P<tyc}iologtcal Meas- 
urement, begun in 1941 It is particularly valuable for the reasonablj well 
trained guidance worker Even less technical is the Personnel and Guidance 
Journal, formerly called OccupcUions Two other publications whose ar- 
ticles frequently concern tests are the Journal of Applied Psychology and 
the Journal of Consulting Psychology 
Measurement is such an important topic that practically all professional 
journals as well as many “popular” magawnes, deal with various aspects 
rather often 

Some important hook^. Some of the important books are bneUy men- 
tioned in chronological order, beginning with the pioneer penod, which 
included roughly the first two decades of the century In 1904 appeared 
E L Thorndike’s An Introduction to the Theory of Mental and Social 
Measurements, which made available for the first time to American stu- 
dents the statistical techniques necessary for educational research and 
measurement Ten years later Truman Kelley's Educational Guidance^^ in- 
troduced educational workers to the alluring possibilities of partial and 
multiple correlations Two important books appeared m 1916 One of these, 
Daniel Staich’s Educational Measurement,*'* was the first book on achieve- 
ment tests and the other, L M Terroan’s The Measurement of IntelU- 
genccf° was the first adequate treatment of latelligeace tests m the English 
language In 1018 appeared the Setenleenth Yearbook of the National Society 
for the Study of Education, Part II of which treats in some detail the his- 
tory of the pioneer period m testing, and which gives descriptions of existing 
tests with suggestions as to their use But it is probably most famous for 
containing Thorndike's statement, “Whatever exists at all exists in some 
amount,” which has been accepted as a sort of creed by many workers 


•1 Published by Bure.u of Publications T«ichers Coffege Columbia Univcreitj 
.• PubLhS by Bureau of Pul hcalions Tcachew College Columbia Un.vtmty 
•» Published by The Macmillan Company, *New Pork, 
i- Published by HouUUon M.fUm Company Boston 

> J’ubhshfdby Public <5011001 Publishing Cbmpany,BJo«iimt...»7. IHmoin 


54 


THE PROBLEM OF MEASUREMENT 


in the field The following year, m 1919, appeared Carl Seashore's The 
Psychology of Musical Talent, a pioneer study in the measurement of apti- 
tude in a restricted field 

Since the beginmng of the third decade of the century, the “quantity 
production” stage referred to by Monroe has been achieved not only in the 
publication of tests but m books as well Only a few of these can be men- 
tioned here as representative of types In 1922 appeared W A McCall's 
IIow to Measure in Education, a comprehensive and critical book on 
achievement tests The next year saw the publication of Ben D Wood’s 
Measurement in Higher Education,^* the first treatise on measurement at 
the college level In 1924 appeared G M Ruch’s The Improvement of the 
Written IJiamtnahon,^® which was the first book wholly devoted to the new- 
type test The year 1927 was especially productive, for at least five impor- 
tant books on measurement bore that date of publication There were two 
notable books on intelligence, E L i:hotndikes The Measurement of Intel- 
hffence,” and C E Spearman’s The AbilUtes of Man,^ each representing 
a distinct point of view In 1927 also appeared the first two books specifi- 
caUy devoted to measurement in the high school,? M Symonds’ il/casurs- 
mcnf tn Senary Educafmn,” and G M Ruchs and G D Stoddard's 
i estsand Measurements in High School Instruction ” The same year Tru 
® Interpretation of Educational Meas^ 

vXr»n i! H ° 1®-*’ appeared another critical 

cliSifthc M? ? ^ become one of the 

Classics in the field of measurement 

"™erous books end monographs on 
Sucat , ht s “-d their application to the diLent 

Serby ™s coming of age is 

raphy of Mental Tests and Rm o Hildreth’s Bibhog- 

ri" ‘'“'’“ed in 1933 Three 
Tests of 1033, 1934 and 1935 » the f™ '°™' ’ and Persmalily 

yearhLls, theS vdum! forerunner of The Mental Measurements 
T n , ■“ 1®38 •< The publication 

" PublE Ne» York 

'• Pubb.bcd by n o' Id St f ™ York 

P KS !;^y 

h?; S: yS"'""' “■“'’"“'‘j’ 

I’obb.htU by World SC Yonten 

"I>ubb.ho,Ibj Tl"p,S5^"Ph")’ 
o Pobh-hcJ br Ilultor. Uoum,y^;S'”'!'“' n’" 

I oblol 0,1 by lioiBor. tn.vm.ij vTZ ^ “'""""’k. New Je,„y 



DLVELOPMENT OF MEASUREMENT IN EDUCATION 55 


of this critical volume marked an important milestone m the history of 
educational measurement The Nineteen Forty Mental Measurements Year- 
book,^ The Third Mental Measurements Yearbook (1949), and The Fourth 
Mental Measurements Yearbook (1953)*^ are Buros’ invaluable sequels to 
these earlier v orks 


David Wechsler’s The Measurement of AduU Intelligence,^ a comprehen- 
sive manual accompanjnng the first individual intelligence test designed 
especially for testing adolescents and adults, vvas first published m 1939 
The Wechsler-Bellevue Intelligence Scales have been veil accepted, partic- 
ularly by clinicians The Wechsler Intelligence Scale for Chtldren^^ emerged 
111 1949 as a serious competitor of the Revised Stanford-Binet Intelligence 


Scale m the age range 5-15 years 

Though World War II temporarily interrupted the publishmg efforts of 
most measurement specialists, during the post-uar >ears a ventable detage 
of excellent books hit the market m rapid-6re succession’" ^ong these 

n ere Frederick B Davis’ Utdiang Human Talent” and Boiothy C Mbns 

and others’ Construction and Analysis of Adaaement Tests" (IBIJ). Lee J 
Cronbach’8 Essenluils of Psychological rcstwj,” Florence L 
Manlal Testina ” William Stephenson’s Testing School Children, Donald E 
Z^sAmamngVocalM^ 

nelSekciim, Test and Measurement Techniques" (1949), Fr^k S Ree- 
and Practice of Psyehalagical ' “d Harold Gu - 

hksen’s raeor» of Mental Test^ (1950), and E F Lindquist and 

Feat Construolmn. M^kl 

,1 dfniidfq/'ft / greatly facilitate research, putting 

which appeared during those two decades 


u Published bv the Gryphon P^, H^g^nd Park, N J 
.. Published by Eutgers University Press 

■■ Published by Gryphon Press Conpany. Baltimore, Maryland 

" S The 4”fn:S'St»Skod ft^L^lo^ Measurement lest- 

.1 Published by the An>|"e“ ppppug office, Washington 25, D C 

S l- ri-Gover^'t mn.ma Office, M w-hmston -o, O C . 1. 



THE PROBLEM OF MEASUREMENT 


F. Some Relativelj Recent Tendencies 

Test construction. The "quantitj production” stage of test construc- 
tion m Amenca non seems definite!} past The emphasis has turned to 
quahtj, although it is still loo much to sa} that a recent cop} right date 

loo- .‘r‘ ’“S*' "'em Kelle} ’s obsenation made in 

IJii that the ruts of the test mmement are alreadi so deep that there are 
man} who do not see be}ond them”"” is still, unfortunatel} , true Hon- 
ei er test makers as a group no longer unblushingl} make the enthusiastic 
claims for their products that were common a fen }ears ago Instead there 
•’““"'■"g’y “«lest attitude, nhich is 
oSenir™ “°l '^'mractenstic feature of the present trend One alert 

attitude tord“l:rnaUe^ 

seieraUcorSnSrarepSleil™;: 

the rehabilitj , and more pa'^ticullrly trthe'rni d f attention to 

test Items is also a notable trend Tie result hS b'*^’ 
ti-pes of test forms and test Situations m “ ‘he appearance of nen 
tion that standard tests do not fulli mdi^lf rcaliza- 

thatmconsequencegrrremnti? 

of improied techniques for 00^1001!!!* ^’’ '““^'''’°'“'*''^”®’“!’'“='“ 
other techniques of valuation ^ ‘nformal teacher made tests and 

Monrccmakesthefo.,ona„gcree,leuteiimmar}.ofthes,tuab^^ 

the future’ Am ?»f /le'eJopnient of diagnostic an? instruments t\ere 

•..ti!?-™. to protect prognostic uses -UTiat of 


the future’ Am “e'elopment of diagnostic instruments t\ere 

u"’i.‘reE;;= 

“*> 13 the entenoa for eialuatin: 


"’Tm,aa„I,,,Kdl„ "■‘c™a for eialual.ng 

tt oiW Hook Compinj ^1907"^'^'^^'“”’ ®/ FJuoato,,^ Reirnom, 

“ttshor S Moniio 1,1 '“"'rnCTIi page 10 Vonlen. 

t lim .w as 'fcarmemnit m 

I.:.''’'";,!.'.'.”-. 


alters Moo,.^' ■’““•'3 1015 >020 and 1915" JmnuA 0 / 

1 -a-rii, I,„or., "T:;-'' MH N„ 1 ‘‘u’iLt 'l" T' »' 

I V" im nna Uurt-au of 



dlv]:lopment of measureuent m education 57 

Usp oi fesls. A Brill* psychologist has suggested that a new scientiBc 
technique seems to gu tiiroiigh three stages, as foIJous 

of deielopmcnt nlicii no one except its iinentOTa 
is interested m it and those norking on other hncs regard it mth indifference or 
suspicion or else tliink it sillj In the second stage it begins to gam support, and 
in the third stage everjoiie wants to use it ohether thej understand it or not 
There 13 then dani;. i ol q fourth sL/ge of disillusionment arid this i- the time for 
critical cxaminatiori 


In tlie (ase of uidard (orIs lu America the stage of imlifliujiLe and 
suspicion, tilth winch Ilice's spelling inquiry ivas met had largely passed 
tvhen the first standardized tests appeared during the first decade of the 
present centuiy Simi that time there have been three rather clearly de- 
fined stages, ^ hich may be designated as those of curiosity, confidence, and 
cntical caution 


The first stage was that of curtosUy In this stage teachers and school 
ofTiuals tried oul Lest« merely because they tvere something new and be- 
cause their use gave owdence, if indeed superficial w character, of up-lo- 
date»ness This at<-itudo tended to die a natural death as the novelty wore 
off 

The second -lage u i- tint of confidence, or 111 some instances that of 
overconfidence Stand ird tests were “suallowed, book, line, and sinker” 
Test results u en un«. i itically accepted at their fare value IQ’s were naively 
taken as accurate mtasures of innate capacity wholly apart from emiron 
mental opportunities, and so were as fixed as the laws of the Medes and the 
Persians In like manner, achievement test scores were accepted as fully 
adequate measur* » of the important outcomes of instruction If only such 
tests were obj e( live, they v\ ere assumed to be sufficiently accurate for valid 
romparisons, nut onlv (»f one school or class with another but also of one 
pupil with another, or even of one aspect of a pupil's acluevement vnth 
another aspect of Ins achievement There is some evulente that this atti- 


>«J 0 Irwin ‘C--rrcUiyn MethKlB ui rsvjMlocv Iln'ish Jaurttal of Psycholm 

happened n mtpJJifiPnPi followinR World War I liaa letn dc^crihef 

is follows A- nnnv if the suhiccis -'bjldirn of srhwl agt l^au«e I3.net ^ 

scores gave a good orrelatioi. with ability for wboil work and perJiar- b«au«e of tl.. 
relative sunphc.ty and economy of themu^ods met Ul te-l.ng w.s 
psvcholoMcal work lu th. field of mdividiial UiBerei.ci-s still siifTcts from 
Francis N MaxfieJd ‘ Trends jd Tesuug lot-l'ifecnpe Lducntwnal Research BulUtin 

haoLoed^o athievement testing has been destribcil as follows In the 
Widespread usS objective tests at the high school and college W-U 
a child like faith m the e/ficacy of objec^ teste « uflS S 

achievement A little knowledge has I-ecome a dangerous thing Wnocr 



58 


THE PROBLEM OF MEASVREMENT 


tude IS on the decline, although unfortunately it is still found too often in 
certain quarters 

The third stage may he termed that of mlicnl caution While by no means 
unnersdl this more nholesome altitude, on the whole, di iractenzes the 
later pliases of the testing motement Hildreth points out some henelirial 
results of this change “A more critical attitude toward intelligonro meas- 
urement, as the outcome of continued experimentation, has resulted lii 
more authoritative research findings, more sensinle and intelligent inter- 
pretation of data This attitude has shoun Itself inth respect to ailiieve- 
ment tests and personality measurements as well The result has been not 
so much the curtailment of the use of tests as their more critical use and 
the more cautious interpretation of test scores On the whole the current 
emphasis is on the use of standard tests not so much for i ompirativ e pur- 

mcrcasrn ° "'S'niction It IS 

fte teult i n -"d 

Xn"Lr:t:s"r 

wm data from intelligence test, 

ilivailual pupib the ca,„tudy 

Imcntonea of peraonalitj scales of sopial ps>j rhologist mil pre\ail 

nient tests „t S,tell,„e„?e '“il't'" ‘ will supple- 

t^clung But no sjTithesis or mtcroretation iv»ii I f clngnostic 
of the pupil 3 plij -ical condition hS houIrbickKrluifM'^^^^'' without knowledge 
hw vocational interests, his social and «im«» pre/iousBcliool historj 

weight given m this «> nthiir^ scor^ 0 ?.^^ 

^hlem presented Tlie case Btiidv mSK^rs^n '■‘^y 

iicalioii and to anv educ.itioiut amis and of 

Lindquist concludes his excellent forti v 
aidemtions m Objective lc»l Cor.stniction Coii- 

a strong warning 

"ork will prove futile 

January 12 l«=vnirpnM. i oJ 

•» Omni ifx Hi] . , ’ ^'^^ondAoCTe/y 41 48-19 

^nlrIcl» S Maafis-lj _ , ^ f^rcinr of Eduea'tonal 

'“In r«.x- no 

ic.r' ""•‘^-■'01 ,,, 


'«« rifan 0>urul » 



Dl:VLLOP^}E^'T or MinsblirMl’N-l lY I DUC\J ION 59 

Sellcted References fob roRTiiLR Rh ^ding 

Bj^’nton, Paul L, Intelligence Its Uamfcetations and Mea&uTcmerd JSew York 
D Appleton-Centun Comp'»n>, Inc, 1933 Chapters V and VI 
Cook, Haller W, "Achieicment Tests in Halter S Monroe (Editor) Emuclo- 
pedia of Educational Rtxardi, pages 1491-1478 Nen Vork The Macmillan 
Compnii}, 1950 

Cook, \\ alter Educational Measurement m the Education of Teaebere’ 

Journal of Educational Psychdogy, 41 339-347, October 1950 
Erie F Gardner, ''De%e]opmentand Appljcations of Tests of Educational Achieie- 
ment m Schools and Colleges," Rcitctc of Educational Research, 23 85-101, 
Februarj, 1953 

Freeman, Frank N , Mental Tests Their History, Principles and Applications 
(Revised Edition) Boston Houghton MjfHin Companj, 1939 Chapters I-VIII 
Moody, Caesar B , "Histoneal Outline of Concepts of Mental Abilitj,” Peabody 
Journal of Education, 30 194 204, Janaarj, 1953 
Odell, C W , Educational ^/castiretncnl in High School New York D Appleton- 
Century Compauj , 1930 Chapter II 

Petei'^ion, Joseph, EarlyConceplionsandTeslaoflnUUigenee Yonkers, N Y IVorJd 
Book Company , 1923 320 pages 

Pintner, Rudolph, Intelligence Testing Methods and Results (New Edition) ^ew 
York Henry Holt A. Compan>, 1931 Pait I 
Rice, Joseph M , Scientific Management tn Education New York Hinds, Noble A, 
Eldredge, 1913 282 pages 

Ruch, G M , The Objective or New-Type Examtnaiton Chicago Scott, Foresman 
A. (Company, 1929 Chapters I and III 

Rung Harold 0 , Foundations for American Education Yonkers, N Y World 
Book Company, 19-17 Chapter XXIII, "Fifty Years of Scientific Method in 
Education \\ hat Ha\ e We Learned? ’ 

Thurstone, Louts L, and Chave, Ernest J, The Measuremeni of Attitude Chicago 
University of Chicago Press, 1929 96 pages 
U S Office of Strategic Services, Assessment of Men New "iork Rinehart A 
Companj, 1948 541 pages 

Young, Kimball, ‘ The History of Mental Testmg," Pedagogical <icminary, 31 1-48, 
NIarch, 1924 



3 

The Statistical Analysis of Test Results 


A General Considerations 

The importance of statistics Measurement and evaluation ^ 
mg increasing!} to depend upon statistical procedures Almost all es 
manuals discuss central tendenc> , variability, percentiles standard scores, 
teliabiUtj and \ abditj , usually presuming that the reader understands 
certain commonli used statistics fairl> rreh Educational literature of al 
types contains such concepts and statistics tbe> are also mentioned a 
professional meetings Workers in all fields of education can anticipate 
heightened emphasis upon statistical thinking and techniques 
The vnew taken Viy Helen W alker is ev en brodder ' 

Tlie conelusion seems inescapable that Bome a«pecls of statistical thinking which 
weie once as«\Hne«l to belong in ralhei speciahi^ technical courses must no^ be 
consi<lere<1 part ol general cultural education 

Statistics in o capsule Nearlj c\er> clenientar> measurement text, 
most general chologj books and manv other introductory v olumes con 


• Helen M Walker StaU’tical Underatan luigs Fvery Teacher Needs Pages 207 
215 in /mproMTM riucolionoi Pwanh OfRcvl lleport ol the American Educational 
Ueff-\rrl As-oewUon WashTOgton P C National Wucil, on Awocwtion IMS Hr- 
( T\a\ A \n Teachers CollfgtllKrmi 49 4i»-4o7 Annl 191S &nt\ui High ScheolJoumal 
33 TO 3i Janusrj lOoO 

Othrr I m Ur artielw are \V £.t«aTili> P. wtor 1 Po igti? E 'Icate- “TAii NokI 
tor «UvU««cal LilueaUon n High brhool nnd t « 11 T,e Et icaUonal Record 29 72 SO 
Jsnuarv 101•^ DougUi E. Scales ‘ThcreBaPlareforStatinUr* in General Education 
\f, intsSchnoa 41 4"^ 43 Fi.bmaia 194S Douglaa t*. Scales and W Edwards Dcmtag 

i^!iiration m <'t.aU«lir.f rFarunpationtn Current Aflairs ScAool ftmew 50 202- 

yo 191V and Mdlirmt Haw 'ThmV og Slraighl alwut facts and Figure- 


H «»1 tmol f^lrriliip f ion lOO Novemb^ 
fiG 

IWKSW U'" 'CTSm 
E’"- 


HLSU- CENTRAL LIBRARY ^ 



STA'llSlICAL ANAIA SIB OP 'lEST RESULTS 


mpossible-that is, to teach m a week or so material usually covered in a 
quarter, semester, or j ear-long statistics course Though some of these 
chapters are very good indeed, failure to atfam their unrealistic goal is 

m ich , 1 “ t “f to teach 

very much statistics in a short penod of time, nor is it practicable to devote 

a large portion oi the measurement course to this area 
The ivnters have tned to solve this dilemma by omitting the more ad- 
vanced material, by putting emphasis on concepts rather than computa 
tions, by repeating mam ideas frequently, and by saving certain techniques 
for the Appendix Fifty multiple-choice items are presented in Appendix A 
pages 429-435, to help the student test his grasp of basic pnnciples A sum ’ 
mary of common statistical terms and a selected list of statistics textbooks 
appear at the end of the chapter 


Despite these aids, the reader will 6nd that he cannot read the chapter 
like a light novel It will require careful study, much as one would prepare 
chermstry or physics asaigrunents Honever, this effort should result m 
permanently improved ability to understand some fonns of statistical com- 
munication, though real mastery cannot be expected Very likely your 
teacher will want to be quite selective m his emphasis upon the various 
topics in this chapter He may skip some of them entirely 


B. Classification and Tabulation 


Before test scores or other quantitative data can be comprehended and 
interpreted, it is usually necessary to summarize them Table 3 gives a 
class record for a reading readiness test admimstered at the beginning of 
the school year The scores appear m alphabetical order as they are recorded 
in the teacher’s class roll book However, the scores do not mean very much 
in this form It is mth some difficulty that can tell whether Richard A, 


with a score of 90, for example, is a very supenor or just an average pupil 
Rank Order. Ordmanly the first step is to arrange the scores in order 
of size, usually from high to low This is called an ungroupcd senes In a 
Rmpll class, tbs IS sometimes all that is necessary Table 4 gnes the same 
scores as Table 3 arranged m order of size Tbs table also shoi\s the rank 
order of the pupils and the scores tabulated without further grouping 
It IS now easy to see that Richard A’s score of 90 gn es hun a rank of tbr- 
teen m a class of tbrty eight, or about one third of the iiay from the top 
In a similar manner, it is easy to mterpret each of the other scores m terms 
of rank But tbs method, especially m dass-^s of twenty or more pupiJs, 
IS likely to prove unsatisfactory Note, for example, that tn 0 pupils make 
a score of 97 Since it is not correct to say that one ranl^ higher than the 
other, It is necessary to assign them fractional ranks As there are six pupifs 
who rank higher, the next two ranks. 7 and 8, are averaged, ^hich pves 7 5 
In like manner the average of ranka 9 and 10 .3 9 5, “"<1 on for the otter 
pupils with tied scores Since there are three pupils each of " t™ “““ 
score of 7S, and there are twenty-one pupils nho rank above this score, t 



b2 TJin PROBLEM OP MEASUREMENT 

a^erage of the next three ranis, 22, 23, and 24 is 23, which is the rank 
assigned each of the scores of 75 In addition to the fact that tune and 
trouble are required to determine these ranks, the list is long and unineld> 
to handle, and is inadequate for making compansons with other classes 
uhich are much larger or much smaller 
The frequency table or distribution. One uaj out of the difficulty 
IS to arrange the scores in a special way Such a process is called tabulation 
The table itself is called a frequency dislrtbulion, or merely dislrtbulion 

TABLE 3 


A Class Record roa a Reading Readiness Te=t 
(38 Pupih) 


Pupil 

Score 

Richard A 

90 

Robert B 

60 

Barbara B 


Charles B 


Mildred C 


Robert C 


Robbin C 


Diney D 


Jim D 


John D 




Don F 


Larry F 

7a 

Richard G 

Zn 


70 

hjlvall 


Ilolwrt H 


Grover H 




Clarence K 


Jerome I 

80 

Mary M 

75 

Hilly N 


Nancj O 

51 

Came P 

109 

Ralph R 

89 

Rilly 8 

58 

W iliiam S 

59 

Grelta S 

72 

OeorK® S 

74 

Pobert S 

75 

Jack S 

81 

I'lehard S 


Mary S 


JrvnT 

112 

1 irhanl W 


l)ol<jr*>« W 


( arl W 

03 




STATISTICAL ANALYSIS OT TCSl R1 SULTS 


63 


The third and fourth columns of Table 4 show the simplest form of a dis 
tnbution Such a distribution consists of tno columns the various scores 
are arranged m one column in order of size and opposite each score is 
recorded m the other column the number of times it occurs Each entry 
in the second column is called a frequency abbreviated / and the total is 
represented by N 

It IS usually desirable houever, to carry the process one step further 


TAB! E 4 

Reading Readiness bcoREs from Table 3 Abrascld is Order or Size ant 
Rank Order and Tabouated 




02 


Tlin PROBLUM or MEASUREMENT 


average of the next three ranks, 22, 23, and 24 is 23, which is the rank 
assigned each of the scores of 75 In addition to the fact that tune and 
trouble are required to determine these ranks, the list is long and unwieldy 
to handle, and is inadequate for making compansons with other classes 
which are much larger or much smaller 
The frequency table or distribution. One way out of the difficulty 
IS to arrange the scores in a special way Such a process is called tahulahon 
The table itself is called a frequency distnlnition, or merely distribution 

TABLE 3 


A CiiASS Record for a Rpadino Readi’sess Te«t 
(38 Pupih) 


Pupil 

Score 

Richard A 

Robert B 

Barbara B 

Charlca B 

Mildred C 

Robert C 

Robbm C 

Dmey D 

Jim D 

90 

66 

106 

84 

105 

83 

104 

82 

John D 


Robert D 


Don F 

59 

I.arry F 

95 

Richard G 


arren 11 


S>ha II 

47 


95 

Gro\cr II 

100 

Jack K 

69 

Clarence K 

44 

Jerome I 

SO 

Mary M 

75 

RiUy \ 

75 

Nancj 0 

61 

Came P 

109 

Ralph R 

89 

!hll> S 

55 

\\ ilRam S 

59 

Crctu S 

72 

tieorge S 

74 

Robert 8 


Jack S 


Richard S 


Mar> S 

63 

Jean T 


Richard tV 


I>o!irr.\\ 


( arl W 

93 



81 





G4 


THE PROBLEM OF MEASEREMENT 


a rule, there is so wiiie a rauge of scores tliat it is economical to group 
them according to size, such as a group including all scores from 1 10 to 114, 
irclusne from 105 through 109, inclusive, and so on Each group i*! called 
a class The complete groupmg arrangement is usually referred to as a 
grouped frequenqj distribution While there is no absolutely fixed rule for 
the number of classes it is usuallj advisable to make not fewer than fwelie 
classes nor more than about fifteen To have fewi r than twelve classes is to 
run the nsk of distortmg the results, while more than fifteen classes pro- 
duces a table that is incom enient to liandle 

Making the frequenc\ table There an foui steps m makmg the ordi- 
narv grouped frequencj diatnbulion These arc illustrated m Table 5, 


usmg the scores giv en in Table 4 

1 Determine the range, which is one more than the difference between 
the highest score and the lowest Of these scores, the highest is 112 and the 
lowe-^l is 44, which gives a range of (112 — 44) -f 1 = b9 

2 Select the class intenal, which is the size of the groups into which the 
scores are to be clas. ified To do this, divide the range by 12, which gives 
the latest group, or class interval, to be used, and bj 15, which gives the 
smallest class interval to be used In this case, 69 — 12 = 5 75, and 
C9 — 15 = 4 6 Since it is impractical to use any class interval except a 
whole number, tlio fractions are disregaided and the nejct highest whole 
numbti 18 taken The class mtmal will, therefore, be either G or 5 Of these 
vl.^s mterv aU it is best to choo«e the one which is more conv ement to use 
J numl»ered intervals have whole-number midpoints when the class 
hmif. arc fractional (end m 5), so usuall> they are to be preferred over 
even nimbcrwl intervals which have fractional midpoints The midpoint 
of an odd numbered group like 110-114, wherein there are 5 scores, is 112, 

bemg, for example, 

113 the midpoint of this even numbered group would be 110 5, which 
miglit result u. more complex computations Therefore, a class interval 
3 lo 0 „hcn the cla., l,m.ts are fractional 

caLih rT “f be Ions 

Luofroar^^ f ‘'‘'= To facilitate tabu- 

score 4t It mil accommodate the lowest 

. 1^000 mn MowT n^ "hot.>„un.bcr class limit wall be 5 points aboce 
0 0 ™^, ,, ^“, !,, =>' ‘5. the next at SO, and 

(core oppo.,te the c'a!,?,„'\,b'dr.rfIllrT‘'‘'' 

recersar, 'o terre all the scores arrant JorZ ? “ , 1 °“ “ 

quire more time tliati the tahulat,^ » u r Process maj rc- 

the fir^t ecore i, 00 In til • * t T. V’ alphabetical list 

Tilth 00 a \crti(Ml Ime opposite the cla«s which 

vertical Imc M drawn to indicate tiu s.nrc The ncM 



STATISTICAL ANALYSIS OF JEST RESULTS 


65 


score IS 66 This falls m the class vihich begins at 65, so a line is made there 
In the same A\ay, a hne is placed m the column opposite the appropriate 
class for each of the other scores To indicate the fifth score m each class 
a diagonal line is dra^vn across the other four This makes it easier to count 
the tallies in each class 

The finished table omits the steps by which it was made In the simplest 
form of a frequency distribution only two columns occur, the first of which 
shows the various classes, usuallj arranged m descending order, and the 

TABLE 5 

An Illostkation oi tub Bbocess o? Maeino a OEoaPED rKEOCEStT Distribution 


Onginal Scores 
(Jrom Table S) 


90 
CO 

too 

84 

103 
83 

104 
82 
97 
97 
5<J 
93 
78 

70 
47 
95 

100 

69 

44 

80 

75 

75 

51 

109 

89 

58 

59 
72 

74 

75 
81 

71 
68 

112 

62 

91 


Steps in Makitvj the Distribution 


Step 1 Determining the laogt 

Highest Score U2 

Lowest Score 44 


Range >■ Difference + 1 * 05 + 1 “ 69 


Step 2 Selecting (be class interval 

69 — 12 3 8 large«tc]T«3 interval deFirablo 

69 — 15 “ 4 6 «mallc‘tt class interval desirable 
(3 chosen because of convenience in tabulation } 


Steps 3 and 4 Dcterniming the limits of the classes and making 
the tabulation 


Whole umber 


LtmtU of Classes 

Tabutahem 

j Frequenev 

110-114 1 

/ 

1 1 

105 109 j 

III 

3 

100-104 

II 


95-99 1 

lilt 

I 1 

90-94 

W 

] 

85 89 1 

1 

1 

80 84 

iHI 1 

U 

75 79 

III 

4 

70-74 

HU 

4 

65 69 

m 

3 

60 64 

1 

1 

65 o9 

,11 1 

3 

60-54 

; 1 

J 

46 49 



40-44 

1 j 

J 

y » 3« 


84 




G6 


THE PROBLEM OF MEASLREMENT 


second of which shows the frequenej or the number of scores m each class 
When tv.o or more schools or ^des are to be compared, it is usually best 
to include all the data in the same table In that case there mil be a column 
for the classes mto which the scores are grouped and one for each of the 
“•chools or grades being compared Table 6 shows a frequency table which 
combines the record of slx schools on a certain test The number of grouping 
mterv al^ \ arics from 0 for School F to 17 for Schools A and D 


TABLE 6 

Dj-,TEnjT;TioN or RE\DiMi ReaoinBio Scopes fob aix schools is i 


School ! 

A 1 

School 

B 

iScAooi 

C 

Schod 

D 

School 

C 

SJtool 

F 

1 

i 


1 

3 

* 

2 

2 


3 

2 

2 

5 

8 


6 

4 

4 

4 

5 

S ' 

2 

3 

S 

6 

10 

4 

4 

1 

4 

4 

1 

2 

3 

6 

6 

4 

8 

10 

5 

4 

4 


•> 

6 

2 

4 

7 

6 


9 

4 

3 




1 

1 


3 

2 

1 


1 

1 

1 

2 

1 

1 

2 

2 


1 4j 

3S 


37 

40 

36 




AUStx 

Schools 


120-124 
llcf-llO 
U0-U4 
lOo-lO'-) 
100-104 
9o-90 
90-04 
&>-S9 
S0-»4 
7c^-79 
70-74 
Or-CO 
CO-ol 
oo-oO 
50-oJ 
4 >-49 
40-44 
30-30 
30-34 
2o-20 
20-21 
19 
10 14 


1 

7 

15 

23 

31 

18 

29 

25 

29 

21 

13 

5 

2 

2 

5 

2 

2 

1 


«■» mcchamcril 

E,d’ ^ 

l...cr tobc bo empl„,«l, but the 

c the title ot the tnhlc^or it mn 1 ^ ^ " "“"'b'’'’ "ay bo centered 

Tl,e teble usuall, tens :;”h SnlTt^r 

honiontal line \nothcr honzontal Ime sen'll i '".i “"‘1 

the b«lt o! the uHe, and other honionifn column headings from 

men.ures uhich maj t« gncn under the labr summaming 

..undto„parntethetum„r^S“^r^^^^ 



STATISTICAL ANALYSIS OF TOST RESULTS 67 

marpns of the page It is considered good form to avoid abbreviations w 
the table whenever possible, and to make the title and headings full enough 

to indicate clearlj the contents of the table ^ un enough 

A t,.o-«a> tabic scattergram, or scatter diagram. Table 7 shoas 
the chronological, educational, and mental ages of the 20 students in a 


TABLE r 

Tire CireovoiomcAl, Carcxitovsi. xsa tfovrxu Aqbs or the 20 Pome in xn 


Pupil 

A 

Expressed tn Monlhs 

Chronologteal 

(CA) 

Educational 

iEA) 

Mental 

(MA) 

A 

IW 

188 



147 

186 



loS 

185 

201 


160 

188 



141 

163 

165 


160 

182 

191 


lo4 

181 



1 164 1 

ISO 1 

193 


165 

179 

181 

J 

167 

176 

165 

K 

107 

176 

187 

L 

157 

176 

176 


161 

170 

180 

U 

157 

17o 

166 

0 

158 

171 

197 

P 

161 

173 

154 

Q 

179 

171 

ISO 

R 

152 

1G7 

165 

s 

160 

167 

177 

T 

156 

16o 

164 


certain eighth-grade class It is sometimes helpful to compare at the same 
time pupils’ scores on two measures A two-way table, called a scattergram 
or scatter diagram, makes this easier Table 8 contains a two-way distribu- 
tion of mental and educational ages from Table 7 

Mental ages, grouped into class mtervals of six months, make up the 
column headings, educational ages, grouped into class intervals of two 
months, constitute the rows For example, Pupil A, with an MA of 208 
months and an EA of 188 months, falls m the third column from the right, 
or 204-209 class, and m the top row, or 188-189 class In like manner, the 
honzontal position of each pupil m the distribution shows hjs itfA, and 
the vertical position shows his EA A tendency wall be observed for the 
scores to arrange themselves m a diagonal pattern from lower left to upper 
right This means that, m general pupils who are low in MA are low in EA, 
and pupils who are high in MA are high in E)A However, a few exceptions 


GS 


THE PROBLEM OF MEASVREMEFIT 


TABLE 3 

A Two-wat Distribotion o» Mbmui Agb and EnuCiTioNAL Age fob an 
E iGBTn Grade Class 
(r - 65) 





69 


STATWllCAh AI^AJYSIS OF TEST RESULTS 

ftand ci.l ror example, Pupil P. who is lowest lu MA, is in the fifth row 
from the bottom m UA 

When the identific vtion of individual pupils in the ‘icattergram is un 
mporfant, the totals only are entered in the appropriate squares (the ceUs) 
J^or fable 8 all such entries nould be I’s, though in Table IS page 92 
numbers in the cells run ns high as 4 

C* Some Clempnlnry ISotions Concerning Quantitative Data 
Concepts \crsu8 computations Two of the most important concepts 
that apply to various kinds of teat data are vanabthty and central tendency 
rhc'^e abstract notions are useful in summarizing the mam features of a 
beivddenng mass of figures Ihere are several commonly used measure*, 
of variability and of central tendency It is possible to be an adept com 
putcr of these measures wthout having a clear grasp of their meamng 
Likcuise, it IS possible to understand the concepts reasonably well without 
being a competent computer, though a purely verbal knowledge of them 
IS likely to be unsatisfying and inaccurate 
III order to read test manuals and other measurement literature effec 
tivolv, one needs a real understanding of the concepts of variability and 
central tendency Aa an analyzer of test scores and similar material one 
ueeds to acquire facility m computing a number of statistics Compara 
tivpij little of this computational ability can be picked up in a short 
measurements course Perhaps all but the rudiments of calculation are best 
taught m a statistics course or au mtegrated measurements statistics se* 
quence 

Lei 118 defer the compvlational roultnes for a few vages m an attempt to 
consider the concepts themselves An imaginary informal visit wnth a 
rookie teacher may help illustrate one aspect of variability and central 
tendency in a concrete situation If you become curious concerning some 
of the statistics reported look at pages 81-85 for computational pro- 
'•edurt'! 

Joe’s grading dilemma Joe Doe, a brand neir English teacher, i-? 
faced with the problem of marking his first batch of English themes He 
asked the 31 pupils m his tenth grade class to * write about 500 words on 
your favonte sport " Now the papers are before him the evening is young 
and he wonders how to assign grades 

“Well,” says Joe, ‘ These papers differ in lots of ways Lee Littlesay 
wrote less than half a page, while Eve Effervescent managed to use up 
seven pages Furthermore the quality of handwriting vanes greatly from 
one student to another, the girls being in general much better than the boj s 
Some of these individuals can't spell either while others can Hinm if a 
pupil misspells the same word ten times docs that count off the s ime as 
mi^sppllmg ten different words once each? Some youngsters have some- 
tliina to say but sav it iingrimm ilically and others are uninteresting but 



70 


THE PROBLEM OF MEASUREMENT 


grammatically correct It's ob\ious that there could be at least half a dozen 
different grades for each student Perhaps I should try to give each paper 
a smgle ‘overall’ grade first ” Let’s help him mull through several ways 
to do this 

One categorj. The simplest method would be to mark each of the 
31 papers with a check mark (\/)» indicating that it is satisfactory This 
w ould result m the distribution of scores m Table 9 


TABLE 9 


An Extreuelt Simple Scoring Method 


Score 

A nmber or Frequency, 


Attrmated / ’ 

% 

>/ 

31 

100 


two categories. But it is obMous that, though easy on teacher and 
pupils alike, this procedure lumps everybody into a smgle category and 
does not discriminate among them at all Suppose, instead, Joe uses a check 
a minus sign (-) for “unsatisfactory ” Then, 
j 1 L . he consolidates the scores into the frequency 

distribution for “Tw o Categories” shown in Table 10 


TABLE 10 

Five Co^sML•T„. Ez,u>R.™r Steps .s Ass.os.no 'Oveesu.” Scoees to 
Eseijsn TflEiiEs 


I* One 
Category 


Tiro 

Calegortee 


Three 

Categories 


/ % 


+ 12 39 
+ - 14 45 
~ 6 16 


four 

Categories 


QP- 


31 


Mean « 1 35 
Median = i 25 
Mode I 

Interquartile 

Itange - 0 70-! 
203 - 133 
Q - 067 
-« 0 00 


Eleien 

Categories 


c 

K 

Grade j 

Score j 

/ 

1 

A+ 

10 


2 

A 

9 


35 

A- 

8 

2 

0 5 

B+ 

7 

2 

8 

B 

6 

3 

11 

B- 

5 

3 

14 5 

c+ 

4 

4 

20 

C 

3 

7 

2 o 

C- 

2 

3 

28 5 

D 

1 

4 

31 

F 

0 

1 

J 1 



STATISTICAL ANALYSIS OF TEST RESULTS 71 

T«ent3-s,x themes struck Joe as being "satisfactory," while he finalK 
managed to classify S as "unsatisfactory ” Thus 5 students out of 31 
1C per cent of the cJass, /ailed The other S4 per cent passed 
'Hirec categories. Do the “satisfactory” themes deserve grades of A, 
or B, or C7 Joe doesn’t hnon, since his two-category grading system fails 
to reveal any differenecs among the 26 students vho earned check marks 
Therefore he rescores these 26 papers, assigning a plus (-{-) to the "excel 
lent” themes and a plus-minus (±) to the remaining ones This gives him 
the three-category frequene3 distribution m Table 10 H (39 per cent) of 
the students submitted “excellent” themes, (45 per cent) turned in 
“satisfactory” ones, and (16 per cent) were “unsatisfactory ” 

Four categories. It is easy enough for Joe to decide that the lowest 
5 pupils deserve grades of “F” and the middle 14 should get ‘‘C’s/’ but 
what about the top 12 persons’ Not all of them did "A” work, thinks Joe, 
so he goes through the 12 best papers again, dividing them into 4 “4's” 
and 8 "B's ” This gives him a new frequency distribution, the four-category 
one Not\ the percentages are 13 percent, B, 26 per cent, C, 45 percent, 
and F, 16 per cent Thank goodness, muses Joe, IVc finally spread these 
papers out enough to assign the usual grades The simple check marks I 
started with didn’t disperse the pupils at all, because 100 per cent of them 
got the same grade Now less than half of the individuals earn any one 
grade 

Variability. Dimly Joe recalls something his measurement teacher said 
about “scatter,” “dispersion,” “heterogeneity,” vanohihty All we do with 
letter grades of A, B, C, and F is to put each pupil mto one of four ordered 
categories A being the highest and F the lowest That’s what college 
“quality” or “honor” points were for, to indicate the rank or value of each 
category A grade of A earned 3 quality pomts, B 2, C 1, and F 0 A bit of 
fancy calculation by Joe reveals that the range of the middle 50 per cent 
of the English class (that is, excluding the highest fourth and the lo\i est 
fourth) is from 0 70 to 2 03 quality points a distance of 1 33 qualitj-point 
units The type of computational procedure that he used is explained on 
pages 81-S5 

Joe becomes curious about the range of the middle 50 per cent m the 
preceding distnbutions Going back to the three category distribution m 
Table 10 he calls the minus category 0, the plus-minus category 1, and 
the plus category 2 This makes the range of the middle half of the class 
1 16 quality-pomt units, 0 17 less than when there were four categones 

For the tn o-category distribution, where a check mark is 1 and a minus 0, 
the middle half range is only 0 60, 0 56 less than the 1 16 secured for three 
categories and less than half the range of 1 33 for four categories This 
pleases Joe, since it seems to confimi his hunch that variability increases 
n direct relationship to the number of categories If this is so he con 
ninies, nhy not put each of the 31 individuals into a oalegorj by himself— 



tuf problem of me {^uremeet 


that i«, why not rank-order the papers from 1 (lughest; to 31 (lowest), 
with no ties*' Then the range of the middle 50 per cent would be 15 5 
Eleven categories. The rank ordering goes along fairly well with the 
A and the F papers, but it is troublesome mJeed for those who earned C’s 
Despite his best efforts, Joe ends up with quite a few ties 1 he final result 
is that he settles for 11 categones rather than 31, which he labels A.+, A, 
■!“» ^4-, B, B—y C+, C, C— , D, and P, as shown in Table 10 The 
middIe-50 percentile range of the scores in this frequency distribution is 3 5, 
much larger than the 1 33 range for four categories This represents just 
about all the fineness of grading Joe feels he can obtam for an overall score 
on these themes 


Joe realizes that some English classes will vary more in the C[uality of 
themes submitted than will others The college-preparatory class is prob 
ably loss \ariablc with respect to Iheme-wntmg ability than is his a “gen- 
eral" section so lu that group the dispersion of grades nould probably be 
ess Tins depends considerably on the grading philosophy of the individual 
tocher, of course, and on the care ivith ivbth he goes over the themes 
borne tochers give mainly B's, while others give both d+’s and F’s The 

wh?rv"\‘' n “ ™™tion would be one 

who pves hall the class d+'s and the other half f 's, with no B’s, Cs 


Variahiht} of theme grades is dependent upon (1) real differences in 

group More conmnmly, the semi intern!, 
middle-50 percentile range (3 5 - 2 7n\' ° 

t 'vi.^Xraho\t“ 

80 peracmile of XX^alM^bt^td S°" ’t h" 

d.stnbm,o„ from the bottom of th! iX?"'’ “ “f ‘he 

T '“''"‘■"S 'wth 0 and go thr^Tin''”'/'’ ‘’’'= ‘“P ‘he highest 
the range of the llwatogory dTstnbuhf ‘0. and jou wall see readily that 

- approximate.- he middle ‘nX!ls'of„,r.'H: straT'l'f; 



STATISTICS AKiLYSIS Or TEST RESULTS n 

Fur the d.»tnl,utimi ot score" 0 10 m laule 10, H is 2 5 The greater the 
range of talent" mthin a group of persons tested uith a particular test 
the greater the standard denatwn of th&t group mil bo A group for A\hom 
the standard dewaliou of scores is iarge i^, said to be “heterogeueou« ” 
i\lulo one m uhiUi a)l individuals earn about the same scores is "homooi;- 
jieou'> ” Thc«e arc relative knw^, however Virtually all groups show some 
scatter on any tost 

You uoiild t\pcet the gtoprophy teat si ores of pupils m Grades 4, 5 
and 0 combined to bi more variable th iii the scores of Grade h alone on the 
same teat beemse piobablj a majority of the Grade 4 students klioiv less 
geography than the average fifth grader, iihile most of the sivth graders 
know more This would cause the standard deMation of the total group 
to he substantially larger than the SD of scores for any one of the three 
grades 

Administer a third grade ppellmg test to college sophomore's and most 
of them i\ill get all the words right, so variability of the scores will be 
slight Give a seventh grade arithmetic test to third graders and most of 
them will miss almost everything In each instance the 'standard deviation 
of the scores will be quite small compared with the SD »hat would have 
resulted had the to«t been given to a group for which it wa'? of adequate 
difficulty 

The concept of central tendency. Because scores on a test vary from 
person to person, some statistic like the standard deviation is needed to 
summarize the extent of this variability without relying solely on intuitive 
study of the whole frequency distnbution Likemse, because the \anous 
persons tested earn different scores, it is not possible to cite any one figure 
that IS completely typical of all individuals The “modal” score or crude 
mode of the 11-categojy distribution in Table 10 is 3, since more persons 
(7) received it than any other single score 
The middle score category is 5, but 19 scores lie below it and onlv 9 he 
abov e The nearest thing to a really middle score is 4, with 15 frequencies 
below and 12 above 

Actually, the point that cuts the distribution into halves, with 15 5 fre- 
quencies falling above that point and 15 5 falling below, is 3 625 This js 
known a= the median It is the statistic with which Q, the quartile deviation 
or semi interquartile range, is usually Imked The range Q disf ince on each 
sale of the median [3 625 i (3 5 - 2) = 1 9 to 5 4J is similar to the inter- 
quartile range Here the interquartile range is 2 4 to 5 9 with a midpoint 
at 4 167, which is 0 54 higher than the median 
When your anthmetic text referred to the average it meant the sum of 
all the scores divided by the total number of scores This is often called the 
mthmetic mean, or simply the mean Like the mode and the median it is 
i measure of central tendency, mdicatmg the point in the frequency distri- 
bution, usually neai the center, toward which the scorc« tend for (he 



74 


THE PROBIEM OF MEASUREMENT 


11-calegorj distribution thp mean is 129 divided bj 31, or 4 2 Compare 
this ivith the median of 3 6 and the crude mode of 3 
Tlic mean grade is about -gth of the distance betiveen C+ and B— The 
median grade is fths of the distance between C and CA- The modal grade 
IS C The mode is based upon few cases and therefore too untrustworthy 
for u«e except as a rough, qmck estimate of central tendencj The mean 
IS the preferred measure, except in certain special cases 

Mean versus median. Suppose that the D category m the frequencj 
distribution had been divided into three parts, with new low scores of —1 
ind —2 as shown m table 11 


T\BLE 11 


Ditioino the Four D a raou the 
11 Categort Distribution or 
Tabu: 10 into Turee Parts 


(hade 

Score 

/ 

D + 

1 


D 

0 


D- 



F 

-2 

1 


‘l>e mode, but the ne,T 

end siwper “r between S2000 

who«cfrcq„e„c5 .3 5 T?he U3r„r 5,5000” 

d«.mble m eueh .nstnnees S„1 “ ‘be mean may be 

the fue cxeculues from the distnhT' the better procedure is to exclude 
mteb Then .heOo nrbemClTrmlV” 

the mean ma} be used ^oraogeiieous group for which 

"r - 

of -) i-ents Tlie third was a workman wli ' ivorldlj assets 

3CU totnied $2000 The fourth ma^d “n nn^ “‘ber as- 

''as a multimillionaire with a net I he fifth 

a of the group were 25 ppi 1 “ *'^^^0 000 Therefore the 
thcpcrrocpcrfcctu.botum^,""^ bbrnre descr.bcs tno of 
nwhan llRure of $2000 docs l,t^ jist. ‘be other three The 

TI.C mean of $1 003 400 « not \ era ? "“P‘ the workman 

^ ■■“'»f»ctorj oxen for the .nult.m.l- 



STATISTICAL ANALYSIS Of TEST RESULTS 75 

lionaire If ne had to choose one measure of central tendency, very likely 
It would be the mode, which descnbes 40 per cent of this group accurately 
But if told that the modal assets of five persons sitting on a park bench 
are 25 cents,” we would be likely to conclude that the total assets of the 
group are approximately SI 25, which is about 85,000,000 lower than the 
correct figure Obviously, no measure of cential tendency whatsoever is 
adequate for these ‘‘strange bedfellows,” who simply do not ‘‘tend cen- 
trally ” 

D. rinding the Mode, the Median, and the Mean 
Central tendency. Charactenstic of most frequency distributions is a 
tendency for the scores to bunch or concentrate somewhere near the center 
An important statistic is, therefore, the pomt on the scale around i^hich 
the scores tend to group themselves This is a measure of central tendency/ 

It IS that value which typifies, or best represents, the whole distribution 
One might wish to know which of several schools made the best record 
on a certain test, and which the poorest To determine this, compute an 
average for each school, and then note which one has the highest average 
and which one has the lowest average In other word<», that school is best 
which on the average makes the highest score, and that school is poorest 
which on the average makes the poorest score * 

Statisticians employ tliroe common averages These are the mode, or 
inspectional average, the median, or counting average, and the mean, or 
computed average Tiie meaning of eac)i of these will now be considered 
The mode. The mo&t frequent score is called the mode It is obtained 
by inspection In Tabic 4 on page 63 the mode is 75, becau'ie more pupils 
made that score than any other The mode is not a very trustworthy aver- 
age, however, especially with small groups In this case the changing of 
two scores might siiift the mode decidedly If one of the pupils who made 
75 had made 76, and if t he one who made 5S had made 59, the mode would 
diop to 59, since more pupils would then have made that score than any 
other Largely because of its hcklencss, tlie mode is not highly regaidcd 
as a measure of central tendency for small groups 

Ihe median. Peihaps the most widely used average in educational 
measurement is the median The median %s the pomt tchtek divules the dis- 
tribution into halves Sometimes m «n ungrouped series the imdsc-orc is used 
instead of the median Strictly speaking, when N is an even number, there 
IS no midscore In that case, it is customary to average the midille pair of 
scores Tor t \ample, m Table 4 on page 63 there are 3R pupils, 19 of whom 
made scores of 80 or loss and t9 of whom made scores of 81 or more The 
midscoie is then assumed to iie the average of 80 and 81, the middle pair 
of scores, winch equals 80 5 Tlie terms median and midscore are often used 
* Obviously, in order to be etatisticaJJy and educationally significant, the difference 
between high and low echoolf should b« fairly large 



76 


THE PROBLEM OF MEASUREMENT 


interchangeably, but the latter should be u&ed only for scores arrauged 
m order of size rather than m a grouped frequency distribution 

TABLE 12 

Ih£ Process ot Locating the Median 
(Score« from Table 5) 


Frequ€7tqj Ditlnbution 


Steps in Oie Process 


I Step 1 Obtaining 


Step 2 Locating approxunate median 

1 + 1 + 1+ 341+3 + 4+ 4 = 18 
- This taLes uj up to 79 5, whicb L** tbe approTtmate 

Ig median 

Steps Determining Ibe correction 
19 - 18 » 1 

the wrrtelxon 

Step 4 Locating the median, 

, 1 .^ " ^ ^ median That la the me* 

^n IS the appmnmate median plus the correc 


clislnlmtioD The median is ofterd” median m a frequency 

It will he noted that cmmtmn does ascrage, and 

The steps maj be summanz^ed as follow ° '‘s location 

5t orA/2 J of the frequencies by 2 Here 

bution count tip^Sbe fr^uelTq coliil^n^""!"^ 

^ /2 obtained m Step l In this ^ possible without passing 

+ 3+4 + 4 gi.c n total of 18 Si t 1 + 1 + 1 + 3 + 

the non frequency. G .lould carrj ' '‘I"'" ^o. for to include 

The approximate medmn then is 70 , r ’ ^V2, uhich is 19 

and 80-84 * ' ^ ^^t^een the classes that 

^_£fr"ta. die cerrecbpu „«*d x, . 

■E«h da. in ih, subtract the total obtained 

7t«.r':cd'»‘'srmt" T” “-P- exen 

-o-iM cone '-f * of 7^9 {,7,'® 

\W, tlerr Tables it 

•--f nlru «l 0T«T 1} « r_, , »n tl e 60 Ri ..1 ** ha^e appeared in 

*' ^teaduig w con-tdored to be evenlj 



STATISTICAL ANALYSIS OF TEST RESULTS ?; 

‘ ™“ "-o™ or 

un t needed to obtain the required ^iV atoree And this srore must eome 
out of the next class, the 80-84 ilass, where tliere is a frequency of 0 That 
IS ne must go 5 of the distancxi into the next class As the class interval 
is 5, this means ? of 5, or 0 8 The corroetjon w then 0 8 
4 06tatn the median This la done by adding the correction to the ai> 
proximate median In tbs case 79 5 + 0 8 = 80 3, the median 


5 Chet} hy cowitvuj do on 84 5 — X 5^ = 84 5 — = 

84 'i - 4 2 = 80 3 

The median for llie clci en-< ategory distnhution m Uable 10 on page 70 
uas bctiired by counting up m the following manner 


1 Half of 31 js 15 5 I -f* 4 + 3 + 7 = 15, nhich canies us up to 3 5 

2 15 5 -- 15 = 0 5, the niiraher of frequencies to be gone up into the 
class that extends from 3 5-4 5 and has a total frequency of 4 


3 

4 

5 


0 5 _ 0 5 

4 - 4 ’ 


0 I 


35 + 01 *3 0, the median 


This checks with the result obtamed by tountmg down 
4 fl - Xl) = 45-^ = 45-09 = 36 


'Hie median is often used as a reference point for descnhmg the location 
of individual pupils m a distribution A pupil in the higher half is said to be 
‘bbove the median/' and one in the lower half is said to be ' below the 
median " Other points m the distnbution are used m a simdar manner 
For example, guartiles divide the dibtribution into fourths, and deciles 
divide It into tenths A pupil m the highest fourth is said to be "above Qs,” 
and one in the lowest fourth is said to be ' below Qi 

Quartiles should not be confused noth guarters (fourths) of the distnbu- 
tion Persons scoring above Q, are m the highest fourth of the group but 
not m the "highest quartile/’ since this expression is meaningless Likewnse, 
pupils scoring below Qi are in the lowest fourth but not in the "lowest 
quartile ” There are only three quartiles, all of which are points rather than 
ranges Q 2 is the median, Qu and Q 4 do not exist 
Similarly, there are 9 deciles, going from 1 through 9 A person may score 
at the 2nd decile (the 20th percentile; or between the 2ad and 3rd deciles, 
but not m the 2nd decile Rather, we would say that he scored in the third 
tenth of the group, counting from the bottom There are no 0th and lOtb 
deciles 


* Table 14 on page 82 illustrates the computation of tliP quartiles 



THE PROBLEM OF MEASUREMENT 


78 


The position of a certain pupil may be still more accurately described 
by indicating the percentage of pupils who fall below bun The points that 
di\ide a distribution into 100 equal divisions, or per cents, are called per- 
centiles or, more simply, cenhles 

Computation of percentiles. The median is the 50th percentile, since 
50 per cent of all the frequencies he below that point and 50 per cent lie 
abo\ e it A percentile is a point m the score distribution below which the 
stated percentage of all measures hes Thus an individual w ho scores at the 
30th percentile of his class has done better than 30 per cent of the students 
and poorer than 70 per cent Percentiles are computed in much the same 
manner as the median, the only difference being that the number of fre- 
quencies to be counted up depends upon the percentile desired 
The 30th percentile of the distribution m Table 12 is obtained as follows 


1 30% of = 30 X 38 = 11 4 l + i + i + 34.i.^3»10, which 
carries us up to 69 5 

2 11 4 — 10 = 1 4, the number of frequencies to be gone up into the 
class that extends from 69 5-74 5 and has a total frequency of 4 

4 69 5 + 1 75 = 71 25, the 30th percentile 

5 Check by counting doton 100% - 30% = 70% of the way 

1 070 X 38 - 260 1 +3 + 2 + 4 + 3 + 1 + 0 + 4 -24 


2 74 5 


-e 


26 G - 24 , 


5^ = 74 J 


. 2^5 ,. 7^5 ^13 


= 74 5 - 3 25 : 


20 0 (70 per Si o*r*thrse^* d'stnbuUon is 71 25, a point above which 
At nrsMt be d Zn V" ^ (30 per cent) fall 

‘'a\crage" because of the tni *** percentile as 

that .s qu.ta common .„ h,gh™™sw™® ’’"'"“‘“SB mark of 70 or 75 
' faihng” percentile The decision to ‘’’'■'o 's no such thing as a 

trarj one For roughlv de"cnDtise n ^ student is an arbi- 

pcrccnlilcs are somctmies usrf ‘''b 25th and 75th 

fourth of a group are said to be “lovi ™ m the lowest 

arc called ‘ high " * ’ "™c ‘hose in the highest fourth 

artlkmtlicmean In ta^rihlsml^i*''™®' “ “““ called the 

nars per-on regards it as the aie™ “ I" common use that the ordi- 
hnows an, thing about M+en “baS .r'"'" f ‘he onl, average he 
average ’ is met with in ordinarj 



79 


STATISTICAL ANALYSIS OF TEST RESULTS 

conversation or the newspaper in such statements as "average tempera- 
ture, aierage rainfall,” “average jueld of corn and wheat,” "average 
price," and the like, it is almost certam to be the mean that is meant The 
mean can be computed merely by obtaining the sum of the measures and 
dividing by their number The measure so obtained is then the value that 
each indn iduaJ ould have if all shared equally 
^Tien the scores are feu in number or in an ungrouped senes, the simplest 
process of computing the mean is the one described above, that is, the 
scores are first added and then this sum is divided by the number of scores 
For example, the sum of the 38 scores m Table 3 on page 62 is 3,050, and 
3,050 38 = 80 3 When the scores are sufficiently numerous to justify 

the use of a frequency distribution, the so-called “short" method of com- 
puting the mean may be more convenient 

M = a + 

N 

In this formula 
M mean, 

M' » assumed mean (the midpoint of an> class), 

S/d * sura of frequencies multiplied by their respective deviations (S indicates 
“sum of 

W = total number of frequencies, 
s B class interval 

The method of computing the mean by the short formula is illustrated 
m Table 13 The steps m the process are as follows 

Step 1 Assume a mean This is taken at the midpoint of some class Any 
class may be selected, even though completely outside the fre- 
quency distribution, but choosing one near the center of the distri- 
bution makes the figures to be Iiandled much smaller In this case, 
tlie assumed mean is taken at the midpoint of the 80 84 class, 

which.s?a^=.82 

Step 2 Lay off the deviations from the assuTned mean The plus deviations 
indicate how many classes various frequencies are above the as- 
sumed mean, and minus deviations indicate how many classes 
various frequencies are below the assumed mean This column is 

headed d , j , e ^ j 

Step 3 Multiply each / by its corresponding d This column is headed / X a 
The first product is 1 X 6 - 6, the second is 3 X 5 = 15, and so on 
Step 4 Obtain the algebraic sum of the fXd column Note that tfie sum of 
the + values is 48 and the sum of the — values is —61 The 
algebraic sum of -61 and 48 m -13 Had the + values evceeded 
the — values, the sum would have been + This is called SJo 



80 


THE PROBLEM OF MEASUREMENT 


TABLE 13 

A Short Wat to Cojpcte the Mean 


Compuiaiwn j 

Steps ir ihe Process 


/ 

d 

fXd 

Step 1 

Assiunmg a mean, 82 13 taken as the 

110-114 

1 

+6 

+6 


assumed mean Two parallel Imes mdi- 

lOo-lOO 

2 

+5 

+15 


cate the cla«3 where it la the midpomt 

'•5-99 

4 

-«-3 

+12 

Step 2 

Laving off deviations from assumed 

00-04 

3 

+2 

+6 


mean. This is the column headed d 





Steps 

MultipK-mgeach f bj itsd This column 

S0-&4 

6 

0 



i_ headed f Xd 

7o-79 

4 

-1 

-4 

Step 4 

Obtainmg algebraic sum of the f Xd 

70-74 

4 

—2 

-8 


column Sum of + valuer is 4S Sum of 

6o-69 

3 

-3 

-9 


— values 13 —61 Algebraic «um is —13 

CO-64 

1 

-4 

-4 


Thisisr/d 







30-o4 

1 

-6 

-6 

Step 3 

Substituting proper values m the for 

43-49 

1 

-7 

—7 


mula, as indicated at the lower left 

40-44 

1 

-8 

-s ! 



A 

- 3S 


+4S 






-61 






= -13 



il - 

V' + 

t X Zfd 

A 




ii « 

SO+64 3X- 

■JZ 




2 

3S 





S2 + 

^-S2. 

fv> 





3S 

3S 



•= 

82 - 

17 = 803 





-tep 5 SMlulc in Ihe formula M = J/' + M' was found in 

Step 1 to be 82 Tlie class intcnal. t, is S, since there are fit e whole 
numbers in each class 40-44 means 40, 41, 42, 43, and 44 -/d, 
l^nd in Step I, equals —13 A = 3S, the total number of scores. 
Tliercforc 


Kn ‘>“'1 ‘he mean when computed by the “long” method 

rrothef ^ w -nnic as when computed bj the “short” method 
„™er m M “ di-crcpanej mat occur doe to the fa.-t tnat the 


asmmption tliat the midpoint of each cla '*5 


STATtSTICAL ANALYSIS OF TESY RESULTS gi 


orVTn "7^" ''T7 

-iimicnc« by cMreme scores, and aheneter it is desired to avoid L m- 
nucnce, tlic median is to be preferred As such situations often anse in 
ediicalioiial measurement, the median is nidcly used For cvample, if the 
test IS too difHcult, tlicre may be several zero scores, and jf the test is too 
easy, there may be sc\ oral perfect scores But m neither case are the pupds 
at the evtremes correctly measured The median m some such situations is 
the best average to use Ihc median is also easier to find than the mean, 
unless an elcctnc calculator is available ' 


D. Measures of VariaJdlilj or Scalier 
Menninp of vnrinliilit>. No distribution is completely described byjts 
n\ crago or central tcndenc> Tuo classes m a school might hav^e the same 
average intelligence and >ct be vcr> unlike The members of one class may 
varj’ all the way from feeblc-mmdedness to the genius level, while all the 
meml>crs of the other group may rate as normal Obvious^, these tiro 
classes present dilTcrcnt instructional problems because they differ in tori- 
ahtlitf/ Variability is the extent to which the scores tend to scatter or spread 
above and below the average It is clearly important to have some con- 
venient method of indicating the \ambiht} of a group There are three 
common measures of variability the range, the quartile deviation, and the 
standard deviation All these measures represent distances rather than 
points, and the larger they are the greater the variability or scatter of the 


scores 

The range. The range has already been referred to as the distance be- 
tween the lowest and the highest scores plus one® It is usually a very 
untrustw orthy measure of vanabihty * The shift m a single score may 
greatly alter the range and thereby materially increase or reduce the ap- 
parent variability of the group School A and School D in Table 6 on 


page 06 illustrate this possibility 

The qciarlile doiaUon- A measure of variability that avoids being 
unduly influenced by extreme scores is the quarttk dematim, or Q This is 
one hnlf the distance beta een the first and third quartdes For this reason 
It IS often referred to as the semi mterqnartlie range Since 2o per cent of 
the scores fall below the first quartile, or Q., and 25 per cent of the scores 
exceed the third quartile, or ft, the interquartile range is the range of the 


• More precisely, the range is the ‘'‘= 

tion eoaeernmg van.bd.ly that the dialnbuUon c«. yield. 



82 


THE PROBLEM OF MEASUREMENT 


middle 50 per cent of the scores The whole mterquartile range might be 
used to express the vanabiUty of the group, but it is customary to take 
only half this distance and to set up a new middle-half range extending 
from Q below the median to Q above the median Mdn dr Q As already 
noted, the middle of the interquartile range will not usually be the median, 
while the middle of this new range will al\sa>s be On the other hand, 
exactly half of all the frequencies he within the interquartile range and 
exactly half outside it, but this does not hold precisely for Mdn d: Q 
Ihe fonnula used for obtaining Q is 


Q = 


2 


75th percentile — 25tb percent ile 
2 


TABLE 14 


The Process of CoifPimvc the QcAimLE Deviation, Q 


frequency DiitnbuUon 

Steps tn the Process 

f 

no 114 1 

10»-109 3 

100-104 2 

05-09 4 

Step 1 Computing Q, the 2c»th percentile 

JA - J of 38 - 9 5 

Counting up 14-1 + 1+ 3 + 1-7, 
aopronmate Q, is &4 5 

95-7-25,^X5-2|-®-417, 

correction 

64 5 + 4 17 » 68 67, 0, 

28 

00-94 3 

8o-89 1 

80-S4 6 

73-79 4 1 

79-74 4 1 

6o-G9 3 

Step 2 Computing Q,, the 75th percentile 
iiV - J of 38 - 28 5 

Countmg up I + I+ 1+3 + 1 + 3 + 4+ 4 
+ 6 + 1 + 3 7= 28, 

Approvunale Q, is 94 5 

2S 5 - 28 - 0 5,°;^X5-^-5.DEa 

4 4 

94 5 + 062 = 95 12, Q, 

7 

c9-qt 1 

5o-59 3 

39-51 1 

4V49 1 

49-44 1 

rr . . , . > 

A 3S 

Formula Q - “ Qi 

2 

Substituting Q — e j 3 0 


Table 11 Illustrates tlic compulaUon of Q It aill be observed that the 
proems ol localini; quartiles w lilc that of locating any other percentile 
In the first step, the fractional part of N indicates the proportion of the 
distnbution which falls below the desired point, that is, for Q, it is }iV and 
for Q, It IS JA Tlierc are fear steps, as follows 






SI ATISTIC.il analysis or TL’ST RLSULTS 83 


S(pp 1. 


Step 2 


Strp 3 
Step 1 


Compufc Q„ the 25th percentile To begin nith, J of 38 is 95 
I he next three steps m locating this point are exactly the same as 
ino5Q in Jocatiiig any percentile 

Compute Qi, the Toth percentile rfere the first step is to take ®iV 
4 of 38 IS 28 5 The other three steps are identical mth those m 
locating any percentile 


in thcjormula Qt is 95 12 and Qi is 08 07 The difference 
hct« con them is 2G 45 Half of this difference is 13 2, the value of Q 
Check your vorl by counting dounicard 


Ihc interpretation of Q and other measures of variability is a relative 
matter Whether a Q of 13 2 is to be considered great or small depends upon 
the magnitudcofcomparablemcasurcs/or other groups using the same test 
Tlic standard deviation. A third measure of \ariabilit>, which has 
many uses m educational measurement is the standard danalion It is 
usualli represented by the Greek letter o, called sigma, and defined as tne 
square root of the mean of the squares of the deviations of the scores from 
their mean It may also be defined as that range above and below the mean 
(3/ ± Icr) that m a normal distnbutioo' includes C8 26 per cent, or 
approximately two thirds, of the scores 
The formula for the standard deviation when computed from an as- 
sumed mean for scores m a frequency distribution is 


<r = i V(N X i/<?) - (ifdXlfd) 

N 


tVNjJd- - IZ/d)’ 
A' 


The compiifational process is illustrated in Table 15 It can be seen that 
the only term not used in the computation of the mean in Table 13 is 
Sf<P, the sum of each frequency tunes the square of its respectiie devia- 
tion' The steps needed for compuUng the standard deviation are as 
fnlJonsi’ 


Step 1 ylssame that the mean falh at the mtdpmnt of a certain class, say 
80-84 

Step 2 Lay off the deviations ahoie and below the assumed mean 

steps Multiply each f by Its d , „ , „ . ,o 

Step 4 Obtain 7tfd, the algebraic sum of Ike fd column Here it is 13 

Step 5 Prepare thedXifXd) column Each entiy in this column is the 
product 01 a d and the / X d to its nght 

Step 6 Obtain Sfd‘, the sum of the d X (f X <t> column All values in the 
aX(fXd) column are positive, since negative deviations are 
squared Their sum is 479 

Step 7 Substitute in the formula for o 

~7^snmoler type a! tregoeoey dr.mbul.on » dsinsied on puges 260 and 290 



u 


THE PROBLEM OF MEASUREMENT 
TABLE 15 

A SlilPUnED ^^AT TO C0MPUT6 THE StANDAIUJ DE^IATIOV 


Compufation j 

Steps tn the Process 


/ 

d 

/Xd 

dX(/Xd) 

Step 1 

Assume a class for the mean (80-84) 

110-114 

1 

+6 

+6 

36 

' Step 2 

l,o> off deviations from a««uraed 

10i>-109 

3 

+5 

+I 0 

7o 


mean 

100-104 

2 

+4 

+8 

32 

Step 3 

MultipU each/ by its d 

9o-99 

4 

+3 

+12 

36 

Step 4 

Obtam 2/d the algebraic sum of the 

90-94 

3 

+2 

+G 

12 


/ X d column Here it is —13 

&J-89 

1 

+l 

+1 

1 

Step 5 

Prepare the d X (/ X d) column 


— 





Each entrj is the product of d and 

SO-81 

6 

0 




the / X d opposite it 


— 




Step G 

Obtam 2/d* This is merely the sum 

75-79 

4 

-1 

-4 

4 


of the d X (/ X <0 column Here it 

70-74 

4 

-2 

-8 

16 1 


IS 479 

o5-69 

3 

-3 

-9 

27 1 

Step 7 

Substitute m the formula as indi 

60-&4 

1 

-4 

-4 

IS 1 


cated. 

a5-59 

3 

—0 

—la 

7o 



aO-54 

1 

-6 

-6 

36 



45-49 

1 

-7 

-7 

49 



4(M4 

1 

-8 

-8 

64 




A - 3S 48 479 

-61 S/d* 


2/d « -13 

tV^2/d* - (2/d)* ^ SvWwm - 5 Vis 202 - 169 

A 38 "* 38 

5V 18 0^ 5 X 134 3 ^ 671 S 

38 “ 38 “33“''^ 


D, a useful perccnlile measure of variability. <7 maj be estimated 
fairlj accurately and easilj by means of the formula 


ff — 04XD — 04X (90th Jjcrcentile — 10th percentile) 

For the scores in Table 15, 0 4D would be secured as follows 
^tep 1 01;/am ihe 00th percentile, which is a point that lies 10 per cent of 
the w aj obten into (he score distnhution 10 per cent of 38 « 3 8, so 
the 90th percentile = 109 0 ^ ^ 

14 0 

« 109 o ^ *= 109 5 - 4 67 =» 104 83 


5) = : 


28X5 

3 


^tcp2 Of,/Qm the 10th percentile, which is a point that lies 10 per cent 
of the waj tip into (he score distribution The 10th percentile » 


38-(i + i + n 


5 + 

L 

5t 5 + 1 


X5 


515 + 


08X5 


— 54 5 + * 


* 5 >83 



STAT ICAL AVAIA'SIS 


or T/.ST RhSULTS 


85 


Ptcp 3 tnihc/mnula t^OAD , = oix CI04 83 -55 83 ) = 


Tnlf r '’/'’i*’ ” valurof T in 

Tnbic lo, winch rounds olT to 18 


Tlioiifili (? I? ti««l much more frcoucntlj than D, the latter is a far better 
Dcrccntllo measure of larmbihlj and is considemblj easier to compute 
I riiclicnl list, of the ainndard dctialinii. T/ie Hlatulard drinalxon is 
Mr most importaiil mmmre of Ihf lonoMmj of U,t scores \ small standard 
dowatjon mc'ins that the group has small \ambj)itv or js relatively homo- 
ficneous, while a large standard deviation mootrs the opposite condition, 
hctcrogciiciti c also his certain other important uses 
Tile po‘«ition of a pupil in a distribution is often represented m terms of 
standard deviation units In the dislnbiitiou u-^cd in Tables 13 and 15, 
w here the mean is 80 and tfic standard dev lation is 18 a pupil whose score 
IS 9S is said to he one standard deviation above the mean, and the score 
written +1^ In like minucr, a pupil whose score is 6^ is said to be approM- 
matcli one standard deviation below the mean, and the score is wTitten 
— Iff Sucli scores arc called standard scores or z scores * 

Which measure of variability is bost^ As a rule, a is regarded as the best 
incatmrc of variabihtj, and the range is undoubtedly the poorest The 
range is subject to all the limitations which the mode has as a measure of 
central tendency Just as the mean is greatly influenced by extreme scores 
so IS ff Wiicnevcr it is desirable, therefore, to avoid the influence of extreme 
scores, (he median is employ od as a measure of central tendenej , and mth 
it a percentile measure of variability such as Q or D In like manner, when 
the mean is used, <r is the appropriate measure of variability 


r. Measures of ndalion^Iiip 

The concept of co-rclalionsliip or concomitant variation. Dunng 
the latter part of the nineteenth century, Sir Francis Galton and the pio- 
neer statistician Karl Pearson buccecdod m developing the theory and 
mathematical basis for what is now known as correlation ® They were con- 
cerned with relationships between two vanate^ for example, height and 
weight It is easy to note that tall persons «‘jual)> weigh more than short 
ones suggesting that above-average height tends to go with above average 
weicht Height and weight vary together though certainly not perfectly, 
there are “beanpole." and “five b, fives" to upset the relationship It 
nould be possible lo select a group of individuals m such a manner that 
the taller the person is the less he weighs, but this negate felationslup 
betneen height and weight is not to be expected for individuals picked at 
random 


■ For a fuller discu"sion of i "o'" *“ EifM Walker ««*« m Ois Hxtloo, oj 
•A Company 1920 

Slaltsltcat Method Chapter V B treaie 1 together in mo^t statistics texts 

m .t. pr... . 



86 


THE PROBLEM OF MEASUREMENT 


Let us examine some other factors which normally x ary together There 
IS a substantial, but again by no means perfect, positne correlation be- 
tween intelligence test scores and average grades earned dunng the fresh- 
man jear of college The higher the score obtained by the entering fresh- 
man, the higher his grades are likely to be The low er an mdivndual s score, 
the poorer a student he will probably make This relationship has been^ 
found wnth all sorts of intelligence tests used in a great number of different 
type colleges e\er smce such tests first became available commerciallj 
shortly after the close of World War I “ 

Husbands and wives tend to be more like each other with respect to age, 
amount of education, and many other factors, than they are like people 
in general The sons of tall fathers tend to be taller than average, and the 
sons of short fathers tend to be short Likewise, the fathers of tall sons tend 
to be of above average height Children resemble their own parents m m- 
telligence more closely than they resemble other adults Positive correla- 
tion between members of families is usually found for almost anj charac- 
teristic from algebra ability to knowledge of zoology 
Using Galton’s ideas concerning trait resemblances, Pearson devised as 
a measure of relationship the producl^moment cx>e;Ficieni of correlation, r 
Smce about 1900 this has been a widely employed statistic In the testing 
field it has become almost indispensable Virtuall> all test manuals are 
plcntifullj sprinkled with r’s, as is most educational literature The class- 
room teacher frequently encounters r’s m his reading and conversation 
For these reasons, both the concept and the computational procedure are 
well worth mastenng 

Pearson's original r and several other related r’s summarize the magni- 
tude and direction of the relationship between two sets of measurements, 
such as height and weight based upon the same persons, or between meas- 
urements on pairs of persons, like the fathers and sons mentioned above 
It makes no difference whether the variates are historj grades and geogra- 
phj grades, or speed of running the hundred-jard dash and skill in plajing 
the violin, or speed of tapping and age In ever> situation, r can have 
values that range from —1 fora perfect inverse relationship through 0 for 
no sjstematic correlation to -f-l for perfect direct relationship, and the r’s 
between radicallj different kinds of vanates are whollj comparable For 
example, it is meaningful to say that reading abihtj and mtplligcnce arc 
more closelj related than licigbt and weight 

In order to compute r w jthout an electric calculator, it is helpful to hav c 
a scatter dngnm sucli as Tabic 8 on page GS Hon ever, those scores arc 
too -implc to illustrate computational procetlurcs well, since not more than 

'•On? of tho mo«t complete »umnnnr» i? 1- Gairrlt ‘A nnd Intnr 

prrUtmn of of F artor. toSrhoU^lir in Co5W»-. of Artt 

nn 1 ^ifnrn. 1 Ti-ifl i r» ( ollrgr- Joimal of zpfnnifti! / /urilion, pj «» 1-1 IS 
DnrTmlK-r I'Ua 



STATISTICAL ANALYSIS OV TEST RESULTS 


a Binric iiul.v ulual falls ...to any one cell of the Bcattorgram The r between 
MA mui E \ for Table 8 .5 05, r.h.eh .nd.eates a moderate pos.fve rela- 

t»on«hjp 


TABLK 16 



THE PROBLEM OF MEiSUREMENl 


Two urgent cautions are imperatue First, r cannot he tnierpreled directly 
as a ‘percentage An r of 0 represents no relationship at all, but an r of 65 
does not mean 65 per cent relationship In one sense, the difference m 
degree of relationship represented by r’s of 91 and 98 is as great as the 
difference between r’s of 0 and 65 As r’s get larger, a small gam mdicates 
a considerable mcrease in the degree of correlation Therefore, an r of 66 
mdicates more than twice the relationship shown b\ an r of 33 These 
relativ e magmtudes are depicted grapbicallj m Figure 4 “ where the cun e 
for negative r’s is sjTnmetncal with the one for positive r’s Thus a gi\ en r 
indicates a high degree of relationship if its magmtude, regardless of sign 
is large \n r of — 72 denotes just as strong an in\ erse relationship as an r 
of 72 mdicates direct covariation, both ha\e equal predicti\e value Note 
that the two cunes of Figure 4 are approiomately linear through about 
the first fourth of the r scale, after which they become increasmgly posi- 
tivelj accelerated An r of 24 corresponds to a 2 of 24, while an r of 995 
corresponds to a z of 3 00 

The second warning is that correlatton, does not necessarily mean causation 
Often \ anables other than the two under consideration are responsible for 
the a'^sociation Furthermore, problems m the social sciences, the field m 
which correlation is most used, are usualij too complex to be explained 
m terms of a smgle cause 

Let us take several examples It is probabh true that m the United 
States there is moderate positive correlation between the average salaries 
of teachers in v anous high schools and the percentages of their graduates 
who go on to college, but to say that these students attend college because 
their teachers are well paid is as maccurate as to saj that their teachers 
are w ell paid because many of the graduates attend college The situation 
IS complex but one prominent factor is the financial condition of the cora- 
mumtj, which to a considerable extent detennmes abilitj to paj both 
teachers' salanes and college expenses 

I urthermorc, it has been found that the percentage of “dropouts” oc 
cumng m high schools vanes mverselv with the number of books per pupil 
m the hbranes of those schools “ But common sense tells us that piling 
more books into the librarj will hardly affect the dropout rate, nor wnll 
getting a better attendance officer bnng about a magical increase in the 
number of books 

Failure to recognire the non-causal nature of correlation is, in its broadest 
sense a widespread logical error, for the fundamental notions of co-rcla- 

“ riTwl upon Itonald A. Haber StatuUcal SteOtofU for Rcteareh U orkert (Tenth 
&! Ufin) Tablet B p35;e2l0 London Oliverand Bo>d IMS The vertical (ordinate) 
bsurr- ranpni; from 0 to 3 0 are Haher • 1 transformation values of the r s«ile They 
are fwt ihc same aa the r-scores menUoned on page 85 

s»nenil such reUtionships see Guy t Ferrell High School lloUtng I orrrr— 
/tn \ni jtii ef Certain JnUmal Faciort Unpublished Ph.D DL«ertation Georse Pea 
• I for Tevl rr» inji Subpage* 




The Relative Amount of Rclnlionah.p Represented 



90 


THE PROBLEM OF MEASUREMENT 


tionship affect our h\es at manj points Going to Sundaj School is gen- 
erallj agreed to be \aluable from manj standpoints, but a positive rela- 
tionship between the rate of attendance and a characteristic such as 
honest\ does not necessarily imply that children are honest because they 
attend Sunday School Underlying both attendance and honestj maj be 
‘Tiome training,” for e^mple A reallj crucial test of the hypothesis that 
attending Sunday School “makes” children more honest would have to be 
experimental rather than correlational 
Note carefully that while correlation does not directly estabhsh a 
“causal” relationship, it maj furnish dues to causes — and these can be 
taken ad\ antage of m pla nnin g controlled experimentation Therefore, r is 
useful pnmaniy for exploratorj purposes Hence, it is emploj ed much more 
widelj in the newer sciences such as sociology, psj chology, and education 
than in phj sics and chemistry 

Various ways to obtain r. There are manj devices for computing r 
One of the simplest methods reqmres an electnc calculator and utilizes the 
“raw” scores themselves, without any frequency distributions or scatter- 
grams However, it is seldom feasible unless a calculator is available, be- 
cause the arithmetical operations involve large numbers and therefore 
become quite tedious Almost all other computational procedures require 
a scattergrara Many routimzed “correlation charts” are available com- 
merciallj, and nearlv every statistics teacher has his own pet version, 
usuallj to some extent original 

This diversity of computing aids is probably mdicative of basic difB 
cultics inherent m the process of computing r “by hand ” There is not any 
rcallj simple waj to do it The chief difiBcultj is that r indicates a relation- 
ship between paired scores so in order to obtain r without undue labor one 
must bj some tndired method find the sum of the products of the paired 
scores That is the intent of e\ erj simpbfied procedure 

One complicating factor m the attempt to simplifj the computation of r 
is that m order to make the numbers involv ed as small as possible, most 
chart designers set up procedures that result m manj negativ e numbers 
Negative numbers are likely to confuse the average student, while large 
positive ones are tedious to handle Both mvnte sizable errors In the 
wnters’ opinion, it is better for persons who are getting their introduction 
to r in this book to work with somewhat larger positive numbers rather 
than with smaller negativ e ones, since, thcrebj , the explanation is simplified 
con«idcniblj and the chances of misunderstanding the procedure reduced 
If j ou alreadj know how to compute r from a scattergram bj some other 
method and know it well, u'sc it rather tlian the one dc«;cribed below ” 
Con^slrucling the f^enttergram In order to get some “liv e ’ data, one 
of l!ie authors took -13 sets of classroom tc'l scores from his roll book 




STA7JSTICAL AXAIVSIS OP TEST RESULTS 


91 


Tiicsc are «ho«7i in Tabic 17 Note there that the A' scores represent num- 
ber of points {not percentage right) on a pretest at the beginning of the 
quarter, ;\hjlc the 1' scores were secured at midterm 

TABLE 17 

I’liDn-ST AND MiDTrmi Scores op 43 GitiouATS Students on Two Teacher made 
O njrcn\E Tests in iN-TEuiiEouTE Statistics 


5(uJ<nI 


a 

b 

c 

d 

e 

/ 

e 

h 

% 

j 

i. 

i 

m 

n 

0 

P 

7 

r 

5 

/ 

u 


Score 

(V) 

Vidferm 

1 jaminalinn 
Score 

01 

Sluder 

G2 

51 

V 

5j 

66 

tc 

53 

40 

X 

49 

3S 

V 

4G 

5] 

z 

cr 

57 

aa 

32 

42 

bb 

42 

35 

fC 

07 

Ct 

dd 


46 

ee 

44 

33 

// 

46 

5S 

ft 

37 

4S 

hh 

57 

44 

u 


55 

' JO 

57 

44 

kk 

5S 

59 


5S 

45 

mm 


51 

nn 

40 

32 


73 

54 

VP 


9y 


Pretest 

Score 

(A) 

Midterm 

Examination 

Score 

{Y) 

62 

65 

53 

56 

62 

56 

49 

54 

62 

56 

44 

39 

60 

49 

47 

65 

44 

45 

49 

39 

36 

38 

40 

15 

53 

50 

44 

47 

49 

41 

49 

35 

44 

43 

53 

eo 

53 

52 

42 

35 

57 

45 

49 

41 


The h.B!.cst score the X n'‘= be 

the t e“ 

are only 11 classes ^j^sses, ve need to use the smaller m- 

“ ith S" 15 closes nmmng from 30-32 (30 rs a mul- 
tiple of 3) to 72-74 , Prefer the Tdrstnbution.n here thescores 

Bepeatmg th.s ^-d „j 4 The 14 classes run 

range from 66 to lo, ” f 64-67 , 

from 12-16 (12 JS a mutaple Table 18 - Put the 

Nou construct a 15 X 1 * „ nn? described ID Quinn 

.et 

MiNenur, PwcAolwra' 1”^ 




Vl7 iss 300 (1J2)U11) 14 852 






STATISTICAL ANALYSIS 


OF TEST RESULTS 


Tabl. 17 p„t « ,.lj, (/) .ho p^pcr ccH f^lh pa ; “ o ZTZ 

f of ‘ho ooatt.rgL and Z gXt 

that TOliirai. until j ou arc directly opposite his Y score Put a tallf in that 

t)^^ all* r “"‘f JOU mil haic a total of « tallies, 

lOr t;/erc arc 13 purs of scores 

I'or example, the lonesome-looking entiy the cell designating an X 
score of 30-11 and a r score of J2-I3 indicates the student nho earned 
•to points Dll the pretest hut onh 15 at midtemi The highest score on the 
pretext was 73, caraed by the person «bo made 54 on (be midterm e\am 
this IS sliown I).\ the talj> at the far right 
After tlir tan>jiig is o\cr, change the tallies to numbers 
Computing r from the scaliergram. Xow 30 U are ready to find r 
Set up the four new rous at the top of the scattergram and the six col- 
umns at the right The first ron is for/,, the frequency of A' scores in 
cacli column of the scattergram These are 3, 3, 3 -f - 1 » 2 , 1 H- 1 « 2, 
S-fl + l+ S**", and so forth The first column to the right of the 
ecattorgram is for/,, the frequency of Y scores in each row of the scatter- 
gram T/icsc are, from top to bottom, 3-M*»2, 

1 + 2 -f 1 ** G, and so on 

The second roiv at the top, set in bold-face type and labeled sho\^s 


deviations from the arbitrary ongin, which in this case is 


30 + 32 


31, 


the midpoint of tho lowest A' class. These d, values start at 0 and go by 
I's up to 14 The d, column at the right begins at the bottom with 0 and 
goes by I’s up to 33 Obviously, the numbers continue by I's until the 
highest class is reached How high they go depends upon the number of 
classes for the A' and Y variables m the scattergram The highest number 
for rf, will always be one less than the number of A' classes, and the top 
number for d, will always be one less than the number of T classes 
The rest of the procedure is self-explanatory, if the meamng of the sym- 
bols IS understood The third f5G-59) row of the scattergram will sene 
to show' how the X d* row sums" are obtained Tliere are frequencies 
in five different cells of that row These are //s for the row To get the 
/ X d, roiv sum for this third row we take the 1 farthest left and multiply 
u hv d of 5 The next 1 is multiplied by its d, of 7, the next 1 by 9, 
L 2 bv 10 and the 1 farthest nght by 12 (1 X 5) + d X 7) + (1 X 9) 

(2 X 10) + (1 X 12) = 53, the figure showm in the "/, X d, row sums 

“tw are o checks m the table The eom of the/, row ot the top should 
couol the sum of the/, column at the nght, since both equal X, the number 
rf pamro cores lUso, the third ron at the top sums to S/f; 
t the sum of the "/. X i mw sums" column at the right 


94 


THE PROBLEM OF MEASUREMENT 


Because there are no other automatic cheeks, all computations should be 
gone o\er carefully, preferablj by someone besides the original computer 
Substitutmg m the formula is straightforward, if one simply follows the 
symbols Each of the square roots m the denominator should be earned 
out to four figures and rounded back to three for computational ease (see 
Appendix D, pages 45&-458) You maj find a table of square roots helpful 
at this point The final r should not usually be reported to more than 
two figures, unless it is based upon se\ eral hundred individuals 

Gimputing the tno means from scattergram data. On page 79 a 

formula for the mean was given as M = M' d — For the X scores 

m Table 18, the assumed mean, iVf' (or, more properly m this example, the 
arbitrary origin), falls at the middle of the 30-32 class, which is (30 4* 32) 
divided bj 2, or 31 The formula for the mean of the X distribution is 


Similarlj , the mean of the Y distnbution is 


V, 


,, , 1, X ZM. 12 + 15 . 4 X 360 

,\ 2~+ 43 


135 + if 


47 0 


Of course, this does not mean that the students knew less at the midterm 
than when thej began the course Had the Y test been given at the first 
of the quarter, the average score would probably have been only slightly 
abov c zero, since the midterm examination cov ered much more advanced 
matenal than the pretest 

Computing the t^»o standard dotations from scattergram data. 
The formula for obtaining the standard deviation from grouped data that 
was explained on page 83 quickl> jiclds standard deviations for the A' 
and Y distributions, since the two square root values have already been 
obtaine<I as the denominator, (132)(1U), m the Table 18 computation of r 

^ ^ 3 X 132 39G 

A 43 43 

_ ..VNAfA „ 4X111 _ 444 _ J33 

A 43 43 

Determining the medians It maj be helpful at this point to review 
the computation of the median bj obtaining tins statistic for tlie A and 
I distributions b5 tlie mclliod explained on p igcs 70-77 The median is the 
OOlh percentile Fiflj per cent (or hah) of 13 is 21 5 I ook at the first row 
at the top of the scattergram and count/, frc(iuencies from left to right 
14-1+2 + 2 + 7+ 3= 10 Tins leaves >ou “suspended” between the 
\ cK-svjs f* 17 and IS V) or at 17 5 Kow count 21 3 — 10 = 0 5 units 
into the IS 00 cIa.M whicli has a fr«iucmj of 7 and on interval of 3 



SlAlIkllC \L A\ALYSIS OF TI ST 


RESULTS 9j 


{lico 1 U=(, it Iiirlildc^ '■core-, of tS If mil 50) y" X ^ 7 ' 2 4 and 

,7 + 2 1 = ino llio mnilim of the \ (pretext) distribution Check by 

counting the otlier no, -.0 ^ / 1 ) = If 0 fomparo this median 

"It,'; "r;; l':,,„m.l oMn.„ntion IS obtiined similarly by 

1^ = i~2 Chctkby countingdonn 4i o - 7 X 4 472 

■1 his is nboiil the same as the ,,„,,ehtcd are consecutii e, 

i„hrd'nnkri,”2!'3 and »« ■>" 

, 1 'ih r nn^^s ^ pos.tii e difference betn een paired 

roam'd r the number of pai. An iHnstration nit, shon how simple 

the computational procedure is historical erents be ranked m 

A certain test question ‘'',;Voarhest and 0 the most recent 

chronological order, 1 . ® ,l,e ranks assigned by Richard 

without tics liihle 19 ."^r nks The first r is secured by 

TABLF 19 


FifnU to lie Hanked 
(1 » tarliest) 



John 
Doe a 
lianka 

Correct 

Ranks 

D 

D* 


6 

1 

1 

4 

3 

1 

1 

2 

1 

1 



4 




2 



6 

5 

1 




SD* 

= 6 


p ~ I 


6X6 


6 X35 

= t-^ 

= 1-17 



9G 


THE PROBLEM OF MEASUREMEN^T 


ranks and substituting m the formula This r, usually called rho and \\ nt- 
ten as p, equals —14 Like r, rho can vary from —1 through 0 to +1 
The negative rho found for Richard probably indicates poor guessing 
rather than actual misinformation ^ote that John Doe, with a rho of 83, 
seems to have had a fairly good general knowledge of the correct chronol- 
ogj, even though he “mi's^ed” everj one of the siv ranks 


TABLE 20 

The t ARiot/R Vaeufs of lino (p) for Ali Po*>sibee Sums of Squared 
Demotions (2D*) for ^’s from 2 through 10 


p = 1 - 


62D* 

A(\*- 1) 





97 


Sl.mSTtCAh AXALYSIS OF TFST HVSULTS 

TABLK 20 (Contmucd) 

T.» or n™ w „ , , 

} rOR > s yjjo,j 2 TI/ROFCn 10 


a s t _ . 

er/>» 


■ s 0 

”2-20 T 1 Afi r. 

1 - s a 10 ■ 


rr. 

7s 

SO 

S2 

81 

SG 

8S 

90 

02 

01 

9G 

OS 

100 

102 

]0i 

lOG 

lOS 

no 

Jii - 
in 
no 
ns 
120 


10 

07 


- IG 

- V) 

- 51 

- fii 

- 01 

- os 

- 71 

- 7o 
-70 

- S2 

- % 

- HO 

- oi 

- 00 
-100 


*{8 

17 

35 

il 

32 

30 


122 

U»1 

120 

12S 

130 

132 

134 


- 45 
-48 
-50 

— 55 
-57 
-60 


- 02 
-03 

- 05 
-07 
-OS 

- 10 
- 12 


26 

25 

24 

22 

21 


-05 

27 

47 

1 130 

13S 

-62 
- 64 

- 13 

- 15 

18 

16 

- 07 

25 

45 

140 

- 67 

- 17 

15 

- 10 

23 

44 

142 

- 60 

- 18 

14 

— 12 

22 

43 

144 

- 71 

-20 

13 

- 11 

20 

4i 

146 

- 74 

- 22 

12 

- 17 

IS 

41 

14S 

-76 

-23 

10 

- 10 

17 

10 

150 

-79 

-25 

0® 

- 21 

15 

3S 

152 

-81 

- 27 

OS 

- 24 

n 

37 1 

154 

-S3 

-~J2S 

07 

- 2>i 

12 

36 

156 

-86 

-30 

03 

-2<i 

10 

So 

15S 

- 8S 

- 32 

04 

- J1 

OS 

33 

160 

-90 

-S3 

03 

- 33 

07 

32 

162 

-93 

-35 

02 

- 36 

05 

31 

104 

— 05 

- 37 

01 

- 3S 

03 

30 

166 

- 98 

- 38 - 01 

- 40 

02 

2S 

ICS 

-100 

- 40 • 

-02 

- 43 

0 

27 






Tabic 20 makes the computation of rho itseli unnecessary when the 
number of pairs is less than IJ Just compute 2Z?^ look m Table 20 for 
this figure, mo\ c oi cr to the appropnafe iV column, and read the value of 
rho there Tor evample, the ^D' for Richard Roe vas 40 Looking m the 
^D" column at 40 and then to the right under the of 6 ne find that rho 
IS — ^\hIc}^ agrees «ith the value computed in Table 19 

A simplified sconng procedure for “sequence” items is described m 
Appendix C on page 435 

Strictly speaking, the rho formula is not appropriate nhen any ties occur 
It is sometimes used as a short-cut procedure for estimating the r betneen 
scores by first changing the two sets of scores to ranks If only an approxi- 
mate measure of relationship is desired, this method may yield satisfactory 
results despite ties It will usually be tedious when JV is as great as 30, 
however for the ranking process will be time-consummg and the fractional 
ranks wi'n be hard to square A worked example appears m Table 21 It is 
based upon 20 pairs of scores in Table 7, pa^ 67, for which the r from 
Table 8 is 65 Note that rho is 67, a discrepancy of only 02 



98 


THE PROBLEM OF MEASUREMENT 


TABLE 20 {Conlinued) 


The Various Values of Rno (p) fob All Possibie Smis of Squared 
Deviations (SD*) fob N's from 2 through 10 


6SD* 






A (A* 

-i> 


SD* 

9 

10 

ZD* 

9 

10 

ZD* 

170 

- 42 

-03 

224 

- 87 

- 36 

278 

172 

- 43 

-04 

226 

- 88 

-37 

280 

174 

- 45 

- Oo 

228 

-90 

- 38 

282 

176 

- 47 

-07 

230 

- 92 

-39 

284 

178 

- 48 

-OS 

232 

-93 

- 41 

286 

180 

- 50 

-09 

234 

- 9d 

- 42 

288 

182 

- 52 

- 10 

236 

-97 

- 43 

290 

184 

- 53 

- 12 

238 

-98 

- 44 

292 

186 

— 55 

- 13 

240 

-100 

- 45 

294 

ISS 

- 57 

- 14 

242 


-47 

296 

190 

- 58 

- 15 

244 


- 48 

298 

192 

- 60 

- 16 

246 


-49 

300 

194 

- 62 

- 18 

248 


- 50 1 

302 

196 

- 63 

- 19 

2o0 


-52 ' 

304 

198 


-20 

2o2 


- o3 

306 

200 

-67 

- 21 

2o4 


- 54 

308 

203 

- fS 

-22 

2o6 


— oo 

310 

204 

-70 

- 24 

2o8 


- 56 

312 

20G 

-72 

- 25 

260 


- oS 

314 

208 

-73 

-20 

202 


- 59 

316 

210 

- 7o 

- 27 

204 


- 60 

318 

212 

-77 

-28 

200 


- 61 

320 

214 

-78 

-30 

2C8 


-62 

322 

216 

-80 

-31 

270 


- 64 

324 

218 

- 82 

- 32 

272 


— Co 

326 

220 

- 83 

- 33 

274 


- 66 

328 

222 

-85 

- 35 

276 


- 67 

330 


10 


- 6S 

- 70 

- 71 
-72 

- 73 
-7o 

- 76 

- 77 
-78 
-79 

- 81 
- 82 

- 83 

- 84 

- 85 

- 87 
-88 

- 89 

- 90 

- 92 

- 93 

- 94 
-9o 
-96 

- 98 

- 99 
-1 00 


Intcrpreling the cocfTicicnl of correlation. In interpreting the coeffi- 
cient of correlation, two things must be considered The first is the stqn 
of the coefficient The sign indicates the direction of the relationship 
Po«iti\e coefficient® indicate direct relationship, that is, there is a tendency 
for the two senes of values tovarj in the same direction, high values in one 
column being associated with high values in the other column, low values 
in one column being associated with low values in the other column, and 
80 on On the other hand, negative coefficients indicate inverse relation- 
ship, that IS there is a tendenev for the two senes of values to vnrj in 
opposite directions high values m one column being associated with low 
V alups m the other column, and high values in that column being associated 
with low values m the first column 

Another thing is cquallj important and far more difficult to interpret, 
that IS the ma{p\ilude or of the coefficient The size of the coefficient 
indicates the dojrce or clo’-cncss of the relationship, just ns the 8 i< 7 n of the 



STATISTICAL ANALYSIS OF TEST RESULTS 


99 


nsTiiUTivo niE CotmarNT or ConnELATioN bt the 
S tFAmtAV Ra\k Difference Method 



Ranks 

El 

MA 

1 

2 

3 

3 

45 

85 

45 

17 

C 

6 

7 

85 

8 

5 

9 

10 

ns 

17 

U5 

7 

ns 

14 

Its 

115 

14 

15 

IS 

4 

16 

20 

17 

115 

18 5 

17 

I6S 

13 

20 

19 


Diferences tn liankt 


Eilucntionil 
Ako (F \) 


Mcnt'il 
Arc (MA) 


!«;♦» 

IV) 

1V3 

1V1 

1S2 

ISt 

IVD 

179 

170 

170 

170 

170 
175 
174 
173 

171 
107 
107 
105 


20S 

21S 

201 

150 
IGo 
191 
K5 
193 

151 
105 
IS7 
170 

150 
100 
107 

151 
IbO 
103 
177 
101 


1 

1 

0 

4 

125 

0 

IS 

3 
I 

55 

45 

25 

0 

1 

n 

4 

55 

15 

55 


1 

1 

0 

16 

156 25 
0 

225 

0 

1 

30 25 
20.25 
6 25 
0 
1 

121 

16 

30 23 
2 23 
30 23 
1 


-20 

0SP» . 

'a(A*-1) 


G X 445 

■ 20f(20)* - IJ 


SD* - 445 
.1-33-67 


j the direction of the relationship The miiumuin co- 

^ocfficent „„ e„„s.s.ent relat.onstop ^hatsoe.er From 

ifficient IS 00, hich in „ . increase in both directions until - 1 00 

;hi3 mmiraum ^alue ‘h® noted that 

3 reached for one limit ana relationship, for both are per- 

-1 00 and 1 00 ■fj^^^SJ^j'Tdiniot.on, the former being in- 
•ect Their one i„ manner, all other values of the 

,ersD and IMerJxms^d t relationship It is 

lame size, such as cocfScient that gives the clue to the closeness 

:he size, not the 

ir degree of relationship ^ relationship is indicated by a 

The problem, then. i magnitude, regardless of sign Tor c\- 

loeffieient of „Lhip is indicated by a eoefflcient of 00? Untor- 

imple, hon close a tdat misvrenng such a question Attempts 

tunately, there is no simp descriptive adjective, such as “high 

to indicate tins Jten misleading, to say the least As a matter 

or “marked,” are vagu 







100 


THE PROBLEM OF MEASUREMENT 


of fact, a ( oefEcient of 60 might be regarded as high for one type of situa- 
tion and low for another For example, a coefficient of 60 between a general 
intelligence test admimstered at the beginmng of the year and school 
marks recorded at the end of the year might be regarded as high, because 
such coefficients usually fall helow that But a coefficient of 60 between 
scores on two forms of this intelhgence test administered the same day 
would be unusually low In other words, “high” and “low” have only 
relatiie meamng Before an interpretation can be made of a coefficient on 
this basis, the reader must at least know what the central tendency of such 
coefficients for similar data is 

Expectanc\ tables A most helpful way to interpret the degree of rela- 
tionship betw een two ^ ariables is to construct an expectancy table from the 
scatter diagram and inspect it carefully This may be done with the 
Table 18 data on page 92 bj calling scores above 50 on the pretest and 
scores higher than 47 on the midterm exam “above average,” as shown in 
Table 22 These cutting points giv e as near a 50 per cent split of the scores 


TABLE 22 

A Simple Expectasct Table Based Upos the 43 Pairs or Scores in the 
Table 18 Scatteroram for ttnicii r — 53 



Below Aterage 
on Frelest 
(30-o0) 

Abate Aterage 
on Prelett 
(51 74) 


Abate Aierage at Midterm 
(48-67) 

^3 = 30% 


21 

litlov iierage at Midterm 
{\2~i7) 

^ = 70% 

23 

f„-30% 

22 

Sunu 

23 

20 

43 


Its po&«ihIe Twenty three persons arc * below average” on the pretest and 
20 “above average” Twentj two students are “below average” on the 
midterm examination and 21 “above average ” 

Of tho'sc students who were lielow average on the pretest, 70 per cent 
were also liclow average on the midterm examination, '’o the odds are 7 3 
that a i>crson stonng lielow aviragc niitiallv will si\ weeks later again score 
t>elow aviragc TlicRime7 3 otld-j hold for tho«c who score abo\e average 
on the pretest this will not ncees anl> be prccisel> true m other similar 
problems l>c<aii-o the numl>cr of persons “above average” maj differ con- 
“ideriblj for the two tests 

TJie 7 individual-? who went from Ik low average on the pretest to above 
ivtrtgc at midterm miglit railed ‘ fal«e negatives,” since it fir-t Ihcv 
were classifiexl loo low while the C who went from al>ove average to btlow 






i^VA'lISTlCAL ANALYSIS OF TEST RLSULIS wi 

a\ prngc nuglit. bo labelcfl * false positues ” Only -i of the 7 filse negatives 
changed grcatlj , the other 4 were not very low on the pretest or ver> high 
atmultcrm Likewise, none of the C false positives 'were very high on the 
pretest or ver} louatmrdtemi Obviously though the four cell expectancy 
table provides a useful summary it is less exact than the scattergram » 

Vululiiv nnd reliability cocfTicicnts One of the most important uses 
of the coefficient of correlation is in determining the v ahdity of a test There 
arc tw 0 1 j i>cs of v alulity , or rather two methods of judging the \ ahdity of a 
tc'st namely , cumcuhr and statistical The former is subjective and the 
litter IS objective Curnciitar tahdtty is determined by examining the con 
tent of llic test Itself and judging the degree to which it is a true measure of 
the important objoctiv es of the course, or a truly representativ e sampbng of 
the C'-'-entnl materials of mstnietion StaUslical validity is determined by 
setting up a criterion of the thing which it is desired to measure and then 
computing the coefficient of correlation between the test scores and the 
criterion The r so obtained is called a taUdity coejjicient, and is mterpreted 
like any other coefficient of correlation 

A second u«o of the coefficient of correlation is in determining the reli 
ability of a test Since reliability is the degree of consistency with which 
the tost measures whatever it docs measure several nays to determine 
reliability are to he found m computing the coefficient between two forms 
of the same test two matched halves of a test or two admimstrations of 
the same test 

The r discussed in this section called the zero-order coefficient of correla 
tjon, is only one of many different types of correlation coefficients 


G iVIeasurcs of Error 

Errors in educational measurement may be grouped conveniently into 
three ty^jcs according to source 


1 Errors of techmque 

a Arithmetical errors in computation or the like 
b Use of inappropriate measures 

2 Errors of measurement 

a Imperfect measuring instruments 
b Unskilled tester 

c Fluctuations in the pensons measured 

3 Errors of sampling 

a Selection or bias in sampling 
b Chance fluctuations jn random samphng 


above 13 the srnipleet « fta Senna BulUtm Ao ® 1 5 December 

A Way of ohSSd tree from the Pajcholopcal Corpor.t.oo o? 

to^Q coDies of Wu oh jn^y oc 
F,tth Znve Ae» York >8 Ne-v York 



102 


THE PROBLEM OF MEASUREMENT 


Errorb of technique. Obvious tjpes of errors are mistakes in adding 
scores and \anous computational errors in statistical analysis The only 
protection against such errors is the exercise of great care Likely to be 
more serious arc the errors due to the use of inappropriate measures for 
the data in hand It is poor techmque to introduce more refined measures 
than the data warrant or the purpose requires All statistical formulas are 
based upon certam assumptions which often are not fully met in actual 
practice The followmg are common examples Computations based upon 
data m a grouped frequency distribution depend for complete accuracy 
on the assumption that the scores are umformly distributed within the, 
several mtervals or that the midpoint of each mterval may be used to repi- 
resent the a\erage value of all scores m the mterval The Pearson r as- 
sumes hnear correlation between the tuo \ariables — constancj of the rela- 
tionship throughout the range of scores Many formulas are based on the 
assumption of a normal distribution of the measures Wbene\er the data 
in a gi\en wtuation fail to conform to these assumptions, certam errors are 
introduced Fortunatelj , m actual practice, these errors are often not great 
enough to mtroduco senous errors of interpretation But gross errors due 
to the use of mappropnate techniques are suflocientlj numerous to variant 
extreme caution Furfej and Dalj *• made a study of articles contaimng 
product-moment r’s and came to the conclusion that this technique is 
emploj ed “vnth little regard to the fulfillment of the necessarj antecedent 
conditions ” In fact, m GO of the C3 articles studied they found that “their 
authors ha\e left themselves open to the suspicion of haMng emplojed the 
correlation technique in a vay which is meaningless, if not positively mis- 
leading ” 

Errors of measurement. There are manj possible sources of errors in 
measurement, even when there arc no computational errors and when the 
most appropnate statistical analysis has been emplojed In the first place, 
no measunng instrument is perfectly valid or perfectly reliable In the 
second place, the personal equation of the examiner must be reckoned with 
Inexperienced examiners maj allow too much or too little time m admin- 
f^tenng the test, or maj olhcnvisc depart from standardized procedure m 
administering the test or m «conng the papers In the third place, there is 
hkelj to l>c great \ anabihtj m the responses of the subjects taking the test 
\ccjdental occurrences, siuh as the breaking of a pencil point on timed 
tests fluctuations in motivation, fatigue, and other plijsical and mental 
factors maj ‘“Cnoudj alTect the tc*‘t n •suits 

U wall be noted that some errors of measurement are sjhtematic and 
lend to affect ail individuals alike Allowing too much or too little time 
on a Ust of reading speed is an example On the other hand, manj errors 
of a varia ble diameter occur affecting the individuals unequall> or m dif 

*• Paul H an 1 Jo«cph f Dal) I’roJuct momrnt Correlation a* a Ucscarcli 

Trrhni jue in I>!ufanoii Joumalof t<!ueationaH‘tyfho!on 20 20i>-2 11, March 11135 



STATISTICAL ANALYSIS OF TEST liESULTS 103 
cnccts of these errors ore presented bnefly n\ Table 23 


TABLE 23 

rrmt^o^os^TA^T \AHrinLL roRoiis ON CriiTAiv Types op Statisitcs 


MtaiuTt 

I Constant Errors 

Vartable Errors 

Ccntnl Tmdcnri 

IiMPMewl or tlftTcatcd 



b\ amount of llie error ■ 

baJaiice each other 

V’arialntUi 

Lillie or no effect 1 

Usually made too large 

Relitioii^hip 

or no effect 1 

U«ua!I> made too small 


It \\i!l he ob'-or\c<{ that constant errors affect measures of central tend- 
ency most scnoijsh , and often there arc no methods for coTrectmg this bias 
Hrrors of vnoifding. It is usually impractical to measure all the cases 
of a gi\cn 137)0 For oumple, it would be a formidable task to obtain the 
mean IQ of ail high-school freshmen m a state, or the difficulty of each 
word in n senes of textbooks Fortunately, it is not necessary to do so 
It baa been found possible to estimate the mnge within which the true 
measure probably lies But to do so, one needs a representative samphng 
of the total population Against errors in a selected or "hand-picked” sam- 
pling there is no statistical protection An adequate sampling may be 
chosen in n random manner, and the larger the sampling, the better, al- 
though increasing the number of cases does not m itself eliminate the pos- 
sibilit}’ of error The sampling method determines whether or not a biased 
(non-rcprescntatii e) sample mil be obtained 

Fiirlhcr reading. In a stimulating article entitled “Errors, Estimates, 
and Samples— the Indispensable Concepts,” Charles R Langmuir" makes 
clear many points that have only been hinted at in this chapter Three 
other sources of valuable supplementery information are "hlaking Test 
Scores Jleaningful/"* "The Tliree-Legged Coefficient,”" and "Reliability 
and Confidence ”” They wall do much to help the statistically untrained 
teacher or administrator understand unportant aspects of measurement 
theory that might otherwise remain vague 


“ - — * 
117ft St, New Yoft ^ ,_3 December ISUO Psjeho- 

?.AIeSSG wi“i «• •'*' 

Corporation Free 


104 


'HE PROBLEM OF MEASUREMEYT 


H. Summarj 

The following is an outline suinmarj of three important concepts and 
some statistics useful in connet tion mth test scores and other quantitative 
data 

1 Cailral lendenq^ 

a The mean often called the ‘ average” in evervdaj life, is obtained 
b\ summing all the scores and divndmg this sum bj the number of scores 
It is for most purposes the best measure of central tendencj 

b The median is a point above which half of the scores he and below 
w hich the other half he Thus it is the 50th percentile or Qz 

c The mode is the most frequent score a rather crude measure 

2 Variability 

a The standard deviation (SD or a) involves cverj measure in the 
distnbntion Approximatelv two thirds of all scores in a “normal” fre- 
quency distribution, he not more than one standard devoation awaj from 
the mean 

b D, a percentile measure of dispersion is the distance between the 
90th and 10th percentiles Four tenths of D {0 4Z>) provides a fau'lj good 
estimate of the standard deviation 

c Q, the quartile deviation or semi interquartile range, is half of the 
distance between Qj (the 75th percentile) and Qi (the 2oth percentile) 
Though widel} used Q is usuaU> a poorer measure of variability than D, 
which in turn is somewhat inferior to «• 

d The range is the distance from the lower real cla«s hmit of the 
lowest class to the higher real class limi t, of the highest class For most 
purposes It IS a verv inadequate measure of vanabilit> 

3 Correlation coianahon, or concomitant larialton There are numerous 
wajs of expressing co-rclalionship The most common statistic is Pear 
son’s r A “implification of r, applicable chicflj to data originally secured 
m the form of ranks is rho (p) Both r and p hav e v alues of — 1 for perfect 
imer«e relationship 0 for sheer chance association, and -f-l for perfect 
direct relation'hip 

Rehabilitj and validitv coefficients are usually r’s secured under certain 
"pccial conditions 

I Instructional Test Itcm^ 

Appendix A pages 420— f35 contains 50 five-option multiple-choicc 
ilcms eoienng t!io matenxl m tins chapter After you have gone through 
Oiaptcr 3 carefully turn to them and test your knowledge Refer to the 
chapter as much as you please 

‘‘iLEcn.D RcrmENcts roa tuimirn Ue-vdivo 

ninl--r^ f*.G 5‘^atuti«iiCa/niIafion/or/?f^nfMT*(5<.condF^htwfil Caml n Ige 
Iirbnl C mil n Isc Lnjvtmtv Pie«E» l9o2 ICS 



STATIkHCAl ANALYSIS OF TEST RESULTS ' 105 

Dlion, nilfml J , and Mns<ej, Franl. J Jr Irtlraltidtan to Slattshcal Amtym 
Jvciv'iorlv McGnn-IIill Uook CompW5, 1951 370 pages 
r<lwnrd«. Alien L , Slalx^tical inalysufor Students in Psychology am Ftiucafion 
Nen Vork Hinehart A Companj, 1916 3(K) pages 
Gnrrctt, Henrj F , m Psychology and Edneahon (Fourth Edition) New 

York Ixingmiiis Green 5. Compan\, 1953 4G0 pages 
Guilforil, J P Fuxidantcnial Slattshcs m Psy^oJogy and Education (Second 
Fdition) New \ork McGraw-Hill Book Companj I9a0 633 pages 
I mclqiij«t E F A Fir»l Course in Ntofwltca Boston Houghton Miffim Company, 

1912 242 pape« 

OdcJI, C B , An InirodueUon to FdueaUonal Slaltsiies New York Prentice-Hall 
Inc ,1910 209 pages 

Tippett, li n C, The Vclhods of Slalistics (Fourth Edition) New York John 
M ilcj A Son« 1932 395 pages 

niker, Helen M , Elementary 5f(ih«ficaf Methods Newr York Henry Holt A Co , 

1913 3GS mgC8 

Walker, Helen M , and lx:%, icxcpli Statistical Inference New \ork Heorj Holt 
and ComtntiN, 1953 510 nape* 

Vule,G l;dn>, and Kendall MaunccG AnlnlrodueliontolkeTheoryofSlaiishcs 
New York Hatecr Fu\ilL*mg Compan>, 1950 701 pages 



4 


The Characlerislics of a Satisfactory 
Measuring' Instrument 


k. Introduction 

Importance of the problem. \\Tiat are the earmarks of a good test, 
examination, or other measuring instrumenf^ In the selection of a test, 
as in the selection of an automobile, it is important to know Mhat to look 
for There is usually a choice among many possibilities which are \er> un- 
equal in ment Each year many automobiles are bought because of the 
appeal of some gadget, such as a fancy radiator ornament or cigarette 
, Qj/i mauy atan/Iasd teste, are Wight ter v-e Wtter reasee Wl*etbec 
a purchaser or is merelj sold, depends largely on whether or not he 
knows what to look for m the article m question Moreover, e\ery teacher 
will ha\e occasion to use tests of his own construction, and should know 
what qualities to stnNc for in such tests As a rule, the same cbaraptenstics 
arc essential in an informal test made bj the classroom teacher as in a 
standard test bought read^-made from a publisher 

In anj satisfactorj’ mcasunng instrument three qualities arc indispen- 
sable lliese are 

1 Validity 

2 Rclialnbty 

3 Usability 

It IS (-y'cntnl ihtrLforc, that e\crj teacher bate a dear idea regarding iht 
meaning of lhc«o chanicton*»lic<«, and know how to judge their presence m 
\ t» 'ts whether standinhzcd or nonstandardired 

lOf. 


-1 SATIiFACJORr Mt ISUniXG IXSTliUMU T 107 
K. Vnliditj 

tho ‘he degree to which 

!or, ZT "’"■’"“""S measures ivhat .t claims to In t 

"ord, rahdil!, means tnitt/idness > Docs the test reallv measure «hnt o 

rtd o^n’ f d "'"7'’'°' "'*7''" “ "'‘"‘h™>^‘« '•easomng test" 

" ? "h-'^h .t succeeds m measuring 

reason, ng nb,ht.\ in anthmclic rather than other things, such as reading 
chihty or general mtclhsenco Validity, then, refers to the truthfulness of 
the test and is ninajs its most important characteristic No matter nhat 
other ments the test may possess, if it lacU lahdity, it is northless 
« licthcr jou arc selecting a standard test or making an informal test, the 
first thing to consider is its lahdity Hoir, then does one judge whether 
or not a tost or other meaaunng instrumeot is \alid? 

General eoriSHlcraltons. The ansner to this question may best be 
flppronchcd b} gi\ing attention to some prehmiimry considerations of a 
general nature 


1 The rtalure of Me (htnj bang me<uur€d must always determine the 
methods and matcnals of measurement In order to ;udge the validity of 
on intelligence test, for o\ampIe, it is necessary to consider « hat intelligence 
18, \s hat its qualities arc, or at any rate, how it mamfests itself In like man- 
ner, in order to judge the validity of an achievement tc&t, it is necessary 
to consider what it is that the achievement test is supposed to measure 
Thi<? means that the first step in judging the validity of achievement tests 
is» a clear statement of the specific objectives of the course or subject 
2 An^ mensurement in education is a)na>s a sampling nevtr entirely 
complete 1 he lest maker rthes upon a sample much us dues the chemist 
in the licalth department in passing upon the quaht} of the city's water 
supply In psychological language, any test is merely a ceries of situations 
designed to call forth a sufficient number of representative responses to 
enable the examiner to determine the amount of the thing in question that 


happens to be present 

3 The accuracy of the measurement, its fineness of discnmmatioii, will 
clejiend upon the purpose it is to serve A cheap alarm clock mil usually 
suffice for a housein/e in deteniMmng when to prepare lunch or to expect 
the postman but a finer timepiece is required for the locomotive engineer 
In hke manner, a sundial or hour glass may be adequate for a gardener 
but a spht-&econd natch is e^'senliaJ for a football official It would be al 
most as nbsurd to attempt to use a sundial fo time a footbaU^Bame as to use 
,t for measuring temperature or wind veloulj In other words *= ) 

of the measunng instrument must aluays he considered in relation to the 
Durnose it is to serve Validity is always specific, in relation to some definite 
a'^tuLoii A test is,not just t ahd, it is lahdjar msdhmg There is no sueb 


thing as general validity 

■ .geepa '^e lU for the diHto.aion between cumcular and .tatistirHl vahtiity 



lOS 


the problem or measurement 


I Iht Validation of Intelligence Tests 
Although the job ot constructing the so-called tests of general intelli- 
gence IS usually turned over to the specialist, a general knowledge of how 
such tests are validated will enable the teacher to select and use them more 
discnmmatmgly 

The meaning of intelligence. What, then, is meant by “general mteUi- 
gence,” the thing such tests claim to measure’ Although there is no una- 
nimity among psj chologists regarding the exact definition of intelligence, 
there is substantial agreement that \fhat existing tests atterrvpl to measure 
IS capacity to learn, particularly to learn the academic tasks imposed by 
the school Such a conception of intelligence is not \ er> “general ” after all 
It IS clearly narrower than the popular notion, since it is restricted largely 
to abstract intelligence and lea\es out of account social intelligence, me- 
chanical intelligence, and intelligence in special fields such as athletics, 
music, or oratory 

It is also clear at the outset that intelligence can be measured only in- 
directly, its presence must be inferred from the observed behaMor of the 
individual, his reactions to certain carefully chosen and controlled situa- 
tions called tests Such tests should meet tuo general requirements first, 
there must be a sulEcientlj large and \aned assortment of test situations 
to call forth a wide lanety of mental operations, primarily of the higher 
type, Bueh as imagiuatwu, iudgment, aud ceasomng Second, the situations 
must be of such a nature that e\erj individual taking the test has had 
approximately equal opportunity to Icam, and as far as possible, equal mo- 
tuation This second standard is hard to meet and is usually onlj approxi- 
mated even in the best tests It clearlj rules out tests that involve special 
talents such as for music or art, and makes questionable those that depend 
on specific «chool experience, which is bj no means uniform for all pupils 
In general, group tests meet these standards especiallj the second, less 
well than do individual tests The Armj Alpha, for example, not only em- 
plojs such situations us reading vocabularj and arithmetic, but, being 
onginall> designed for soldiers, has material that is more within the ex- 
perience of men and bojs than of women and girls 

The Tcrmnn criteria. In dcvelormg the lOlG Stanford Ilev i^ion of the 
Bmet Scale, Tcrman relied upon three additional crittna of mtelligen«.c 
namelj , age increase coherenev , and world success » Age increase means that 
each test Item must show an increasing ptreentage of successful responses 
from one > ear lev el to the next This is onlj a partial cnterion, since it roust 
.i^Mime that the items cho'^cn arc of a tvpe that max reasonably bo expected 
lo me isurt intelligence Purely physical i n. tsiircmcnts, for example, such 
a-s 1‘trcnMh of gnp, or speed in running, chow age increases The second 


F.n « ^«cu-«iion ^Ihc proced iro u<«t in thr Rcva-c<l (1937) Stanford IJtncl pec 
T’*'* Ilovt-on IXwclurr. ‘ ,n t^umn McNcmar The of 

ISiKft SnU pp | H boston lIouRhlon MiPlm Com[»anv 1912 



5 {TIStACTOKY Mr ISbRING INSTRUMENT 109 

»» '!»' assumpfon that the Me test .s a 
more t ahd measure of intelligence than arc an, of its parts Vmm iZ h 
^theent, retest the group is du.ded into dull normal andb^fghtltmnT 
^.cn to be acceptable each Hem must discriminate among ^^'00.10^ 
fmm T'il"f “1 mcrcasing percentage of successes as ne go 

rcr, of .1 ‘ ^ m»«} measures the internal consis 

tencj of the test, much as a logician judges the aahdity of a couise of 
reasoning Both Galton and Binet used the method of contrasting groups 
aitlioiigh their groups n ere selected from eatemal entena rather than from 
the test Itself 


The thiitl criterion, tforWsiiCffss is the ordinary common oense standard 
ofc\crjdaj life As the test is \ahdated on children this really means the 
child’s norW, nhteh is pnmanly that of the school his standing m ^^hich 
IS reflected in his academic record This is of course not a perfect critenon 
It 13 not onlj high!} subjects e in character but it throws the pnmary 
rr« 5 ponsjbjht^ nltimatcl} upon the judgment of teachers which because 
of Its limitations the test is being designed to replace or supplement This 
IS not so bad as it seems ho«c\or for the basis is not that of the pupil s 
mark on a single examination who'se notorious unreliability has already 
boon described hut rather tliat of his enitre record for an extensive period 
a far more stable thing rurthermore the re/mnee is usually placed not 
upon the judgment of anv single teacher but rather upon the average o! 
several expononcod teachers The conscii‘su$ of competent persons is the 
ultimate cnterion of values from the constitutionality of a law dovm to the 


beauty contest at the local theater 

Indnuliml versus group tests It is generalh assumed that an indi 
vidual Kjt IS hkcly to meet more fully the tntena descril td tlian does a 
group test Furthermore the individual tC'^t permits the trained examiner 
to observe more carefully the behavior of the 'subject during the course of 
tlic aximination For example if the subject shows signs of nervousness oi 


refii'?es to cooperate fully the examiner realizes that a valid measure of 
intelligence is impossible under the circumstan « s and so waits for a more 
opportune time Also if the subject is handicapped by defeclne nsioi 
or hearing this condition is libel, to be di,co,ered by the examiner „h i 
then takes it into account in making his interpretations and recommenda 
tions For these reasons the individual intelligence test is nsnallj taken a 
the criterion or standard for ,alidatiug the group test Honevor some 
times a group test is , alidated by comparing the si ores made on that tea 
ivith those made bv the same individuals on another group test or possiHl 
some combination of t,vo or more group tests For all such compar. ons 
:,7h a critenon ,vhether it be the Revised Stanford Binct or some group 
test or tests the Pearson product moment coefficient of cor eiat.on .s i^sn 
allv emploved r is then refeircd to as the wliMj mffiaent If the »„rce 
f nt “fth' he nterion is perfect the coefficient ,s . 00 and ,f ,h ,e ,s , r 



110 


THE PROBLEM OF HfEASUREMENT 


consistent relationship at all, the coefficient is 00 Xaturallj the nearer the 
coefficient approaches 1 00, the higher the validity is said to be, although 
in the last anal> sis erything depends upon the appropnateness of the 
criterion itself Usually the most difficult step in test i alidation is secunng 
an adequate criterion 


TABLE 24 

IVTEECOEEELATIONS OF INTELLIGENCE TeST ScORES 4NTJ FiVE-SESIESTER AvER-KGE 
Gp^des for 2S4 Seniors (124 Bot<5, 160 Gibis) in 4. I os Angeles High School’ 



InUlligenee Tesl 



InUlhsenee 

Test 

Otia Self 
Admims 
tmng 

Temtan 
l/c\ emar 

Cahfomta 

Short- 

form 

SRA 

A on- 
Verbat 

1 

SRA 

Primary 

Mental 

AbiUUea 

Arerage 

Grade 


Ron-* 

Girb 

Boj' 

Girl 


Girl 


Girl. 

Boj« 

Girl« 

Boy*® 

Girls 

Olu 

- 

- 

75 

77 

73 

1 67 

36 

1 24 

57 

57 

36 

44 

Temvin 

MeSemar 

75 

77 


— 

70 

70 

27 

36 

56 

45 

45 

52 

California 
Sherf Form 

73 

67 

70 

70 

— 

— 

31 

24 

54 

58 

38 

50 

SRA Son 

1 erbal 

36 

24 

27 

36 

3t 

24 

— 

40 1 

33 I 

40 1 

23 

40 

SR\ riu 

57 

5- 

56 

4o 

54 ' 

36 ! 

33 ' 




42 

44 

Mean IQ 

1 103 7 

1 lOo 5 

1 1142 

1 1182 

1 96 4 

- 

- 


Table 2A contains the intercorrelations of fj\ e intelligence tests and their 
correlations with aiemge grades, separately by se\ It also shons the mean 
IQ on each te=t for the boys and girls lumped together The Otis Self- 
Admmistcnng Higher Examination, Form A, correlates nith the Terman- 
McNemar 7o for boys and a for girls, while the Tcrman-McNemar cor- 
relates onh 27 and 30 mth the Science Research /Associates (SR.V) 
\on-\erbal Apparently theOli3,Tcnnan-'\IcNcmar,andCalifomiaShort- 
l-orm Tc'^t of Mental Maturity (1912 edition) are more closely relatdl to 
each other than to the other two tCNts 

Tlie mo«l lalid te-*t for “predicting” the fiic-'emcster ai erage-gradc 
rntenon i s the Tcrman-McNemar, with cocfiicicnts of 45 for boys and 


« ' T folIowinK unpublL^hrU (mtm«.Kraobed) report 

Walter G Hrsl and Alice Ilo-n A Comparafire iturfy of tKf Data for F\tt D»ifrfm£ 
/«V' TesU {dmtnxtlfTfH to £3i TvrlflK Crade StudfnU at SoulA Gale HiQh SeW— 
^ //r AneeW Cumnilua Divuaon Los Vngelcs City School Dialncts, 

February 19i0 pages. 










A S \TISF \CTOR) ^I^\SURI^G I^STRUMFNT 111 
for cirls, tl.o Sit \ Non-\ crbal is Icsst iil.d Note that ra each of the 

cocmcents'hoiicicr All fi™ 

last semester of the.r f ^,,„3 th„e is no prediction here 

?r 

simlf ™'nre reported loJW and J'^tter !l tS 

mtelliRence-tesl scores ot i I^r 30 for 54 individuals tested 

grade and l^'r aieragc high sriiwl Thee validity coefficients 

in the Hurd grade, and 31 for grade 

z:i """" 

c\cn though tiic mc'ins arc quite j^eans so it tells us nothing 

vvhollj independent of „„ two highly correlated tests are In 

about bow ;ntcrcbang.ab e ;co^ students go from 96 4 foj to 

Tnhle 21 the mean Iws lor ,jg 2 on the bhA xno 

r r.ts 

samopubbsber On tbc other hand 

differ by only 1 8 points .,v,,,ctictI TesU 

jl The 1 obdalion cf validation 

Cvirric..,arversiissta..ucay^^^^^^^^^ 

of an acbicvcracnt tes ■ ^j,^i,er of proesn*"'^ tests ^ distmctioe 

gence test, and a great ^^,_j 3 tion of validity By cur 

mination In discu's 8 ^^tor validity ^t of the test is truly 

should be made betn een „h,eh ‘ ^^Ldity implies an 

ncular validity is ® t of the couiae (^rrM t t 

reprcscntativ e of the j^quacy of the sa P 8 3 ^t(,„t to nhich 

act of judgment as to ^,preted to mean mer y ^^aential 

In the carher days this^7„p,sentative^ 
the Items of the test ^tion '^t^ct matter which at best 

materials employed ® jy m terms j „acUcm expec ed 

lidity IS thought O' " ‘ PJ^ather m ^^rcenrer of gravity has shitted 
IS merely the sfonto ^ . i„ voids the 

of the pupils themse n.„„rf sdueatimtd 

from the c urriculum 

Notebook No 3 



U2 


Tin: PROBU"\i or me isuiiEyr\T 


Statistical ^alldlt^ refers to the mathematical proce^setj for determining 
the degree to v,hich the test agrees \ntfa, or correlates inth, some cntenon 
which IS set up as an acceptable measure of the thing in question Some of 
the'^e statistical procedures aim at \ ahdating the test as a ^ hole and others 
at \ ahdating the items indi\ iduallj Although the procedures commonlj' 
emplojed bv professional test makers are often rather technical, especially 
for item ^alldatlon, the essential ideas are relati\ely simple 
The technique emplojed in the preparation of the Cooperati\e A.chieve- 
ment Tests represents an effectue combination of statistical analjsis and 
the judgment of experts These tests are constructed bj' a trained staff 
iiorkmg in close co-operation mth classroom teachers, subject-matter spe- 
cialists, and test technicians The procedure is outlined as follows ® 

a Preliminarj planning and selection of content 

Analj «es of curricula, textbooks, research studies, etc 
Tonnulation of objectues and determination of general plan 
Preparation of detailed test outlines ba«ed upon sur\ ej of materials 
Submission of outlines to authonties for cntici<«m 
Re\nsion of test outlines m accordance with cugge®tion of critics 
b Preparation and editing of te«t items 

^ nting of items bj test editors and cooperatmg experts 
Submis'sion of items to authonties for cnticism 
ReMsion of items m view of suggestions received 
Preparation of experimental forms of test 
c Administration of experimental forms to a representativ e sampling of students 
to obtain Item difficulty and vabditj indices, and to detect items which mav 
be weak or ambiguous 
d Preparation of final form 

Selection and rev ision of items for tentativ e final form 
Obtaining from experts m subject matter fields test technieian=, etc , sug- 
gestions and cnticLsms of the tentative final form 
Rev i«ion and final editing of the test, ba^ed on the cnticisms and suggestions 
reccuevl 

X: AATfhTjvATtvVTOn tA xA xvifti ear'iier lorms lor equating nni 

determination of scaled scores 


Perhaps attention should al«o be called to certain limitations of fre- 
quenej of mention or u^e as a criterion for selecting matcrnls cither for 
the curriculum or for the test In tlio first place, to accept irhnt ts as a cn- 
tenon of irhat ought fo be leaves no room for progress For example, someone 
has dcfincxl a sjaionjan as a wo“d jou use when jou do not know how to 
spell the word jou want It can scarcMv be doubted that there is a wade 
margin between the words actiialU used m ordinarj speaking and waating 
and tho<=o that should be u«od to con\ ej best the meaning intended In the 
second place, frequenej b> its very nature is a poor standard for judging 
importanc e For example, birth ami death occur but once m the life historv 


ArhitrrmentToU for U^hSthoct and Collet NcirYork 

dislnbiitf,! h\ the CoojK-mtivc 
T«*tInvi.wnoft,.luc3HoT«lT*-rl»nK‘Vrvice.20NwiuStrt^t Pnnn t .n Nc« JerM-j 



113 


a S\TISFACTORY WASURIXG INSTRUMENT 


saj tiie> are for this reason less im 
t ortant than dressing and undressing which occur eieiy day’ Frequence 

u ,1^ a nnporfant as one^meaLe S 

utd t\ can rarelj be regarded as the best entenon for validating a test 
rt should u‘=ualb be employ ed with other criteria, rather than aloL 
borne criticisms of test inhdil). One of the commonest criticisms of 
the Mlidilj of achic\cmcnt tests, cspeciaUy those of the objectne hpe 
i\Jiothcr standardized or nonstandardizcd is that thej are predominantly 
ncltiai in character It is alleged that tbej succeed merely m measuring 
\crlnl memory ns distinguished from genuine understanding and leave un 
measured the rcallj important outcomes such as discrimination judgment 
intellectual and emotional attitudes appreciations and the ability to make 
intelligent application of knouledge to neu situations Ei en the best fnends 
of acfiicN cment tests rcadifj admit that as such tests are commonly 
made and used, the cnticism has some merit In fact no one has recognized 
the limitation of existing tests more clcarlj than some of the outsfandmg 
leaders of the measurement movement itself Long ago Thorndike wrote * 


In the clenicntarj schools ne now ha\e itian> inadequate and c\en fantastic 
procedures uarading behind the banner of educational ecience Alleged measure- 
ments are reported and u«cd which measure the fact in question about as well as 
the noil© of the thunder measures the \ oJtage of the Lgbtnmg To nobody are ^uch 
more detectable than to the scientific worker mth educational measurements 


Thirteen jenrs later Monroe’ nrolc about the “child like faith in the 
cfTicacy of objective tests as instruments for measuring school aihievement 
on the high school and college levels Tliree examples of unw arranted beliefs 
w ere cited 


I Objcctn itj in «conng is an essential requirement for a satisfactoiy test and 
if a test IS objecti%e the scares jiekled by it may be considered highly accurate 
me'vsures of school achie\emenl 

II If a test has been showm to be higlilv reliable the scores yielded by U are 

highly accurate measures of the achievement eivecificd by its announced or impLed 
function .. 

III A high correlation with a entenon is sufficient evidence to lustify the use 
of the scores jieWed bj a test as highly accurate measures of the achievement 
considered to be defined by the entenon 


IVhde ad three points arc related to validity the first two are m^ore ap- 
propnately treated in later sections The thud point merits further dis 

™VaWi ty coefBcients are no exception to the general principle which holds 
that all coefficients of correlation are definitely influenced by the vanabibtj 


.w T Thnrmt.ke Measurement in Education rwntv eml Ycarboot of Ihe l^ a 

Sonety 41 48-62 January 12 1935 



116 


THE PROBLEM OF MEASUREMENT 


the possession of knowledge and the ability to use it, which averages even 
less, one would hardlj expect to find the perfect correlation between edu- 
cational facilities and educational performance that such practice appears 
to assume It is doubtless still true that there are Mark Hopkinses capable 
of transforming mere logs into colleges while marble palaces may remain 
but piles of stone for lack of such a magic touch In any case, it is safer to 
examine what is happemng to the student at his end of the log, than to 
remain content to measure the dimensions of the log, or even the creden- 
tials of the indundual who happens to be at the other end 

Evidence concermng the \alue of indirect measures is conflicting For 
example, in a study involving a test administered to 300,000 young men 
which measured both intelligence and general achievement, Davenport and 
Remmers'“ found rather high r’s between the test means for states and 
such state characteristics as telephones per thousand persons ( 83), per 
capita mcome ( 81), \alue of school property ( 76), and Negroes per thou- 
sand persons (— 70) Thej located and named “state economic," “rural- 
urban,” and “deep-South versus non-South” factors which seemed to ac- 
count for most of the correlations found Their conclusion is “ 

These data are all state data, thej do not applj to mdivnduals Without much 
facctiou^nos'i however, we interpret these results to mean that the probabihties of 
reaching a high cducatioiul achiev ement arc much greater if one comes from a high 
income state which is highly urban which is not in the South and which has such 
adv antages as hbrarj «er\nce av ailable to most of its population, has a high pro- 
portion of foreign bom citizens a la^e number of residents m ir/io’s Who, and 
manj telephones 


Using achievement and intelligence test results from 154 commumties, 
large and small, throughout the United States, Thorndike” found 24 com- 
munit> variables to be much more highly correlated with intelligence than 
wnth achievement In fact, the estimated maximum correlation of these 
community aspects (population in thousands, per cent native-born white, 
and BO forth) with achievement test means was onlj about 30 Thorndike 
offers sev oral possible explanations of this low relationship 
It ma>, of course, be expedient at times to relj upon indirect evidence, 
but at anj rate, one should do so only where direct ev idence is not available 
and even then with full realization of the nska involved lor example, up 
to tiic present time test makers have found it difficult to devise suitable 
instruments for measuring such intangible outcomes of teaching as atti- 


KdJo InU^rtiatxon of f iuraiKmal Mroaurcfimi* page ICK? \otikcn»-on Ilud-nn 
New york W otld Book Companj J927 

'•'“'"'•'"StaleChaMctcriAtio.nc- 

n?V°\ 0/ itolimal 11 1111 


Ihtd laKf* US 
** IlolxTt I Thom like 
Ara/Jpinie Ac! 
iJol, Note of CortecUon 


Community \«mUra A« Pmlictoni ot Inlclligelico mil 
Joufytal ty fdueatwnai i^tychology 42 321-'V3S Octolxr 
43 179-180 March 1952 ' ’ 



117 


I SATISF Id Oliy VLASURINB INSTRUMENT 


tude, ,.nd l!„t .t « probablj true that the bettc. 

stanthn tcAs come for doser tu ti.c .-.urms the ubjectnesactually attamcd 
or nimnl nt ciliicatiw.sl prac/urc than they come to measuring those 
d r 'i i" nppamntly just as difficult 

hml H ‘T" ""■'"P'' » '» '<> ‘h™ It seems reasonable to 

tlnnk t!mt Jt is no less <limcuH to prov ide tlie appropriate teaching matenals 
for bringing about t!ic riglit kind of attitudes appreciations and interest^ 
than It !<, to pro\ idc the appropriate testing materials for detemwning hon 
«ell the job IS being done It vhould be kept in mmd that a valid test con 
blasts largely of a reprrsenMne sampU;i(f of Ihe malcrtals that make up the 
course It ‘should lielp to rJanfv the atmosphere once and for all to recognize 
franU} th tt Ike less tangible oidcowics are harder tu teach and to test than the 


more tangible ovicomes And it maj bt that for some timi to come ue ^hall 
have to be content to aim at both mdircttly 
Ilut this IS no permanent *hotutiun to the probjoni One of the important 
«cnjccs the mcasuremtnt movement can render edmation is the clanfica 
tion of its objecluos The neccs&itj for this should be apparent both to the 
curriculum maker and to the test maker Considerable progress has already 
boon mado in this direction and more will doubtless be forthcoming The 
pioneering work of Wnghtstone” is nn illustration He reports a senes of 
tests in the social studies mth such a div crsity of aims as the interpretation 
of fai ts tlio making of generaJizatjons the organization of data several 
important work stud} skills and certain civic attitudes and beliefs Reports 
of the Kight-'i car Stud} of the Progressiv e Fdiication Association indicate 
bubstantml progress m this direction “ 

Item annl>sis Specialists m test construction not onl} attempt to 
validate the test as a whole against some outside criterion but also to vali 
date the items on the test individually usually against an inside cnterion 
the test as a whole Frequently an outside criterion would be better but 
it IS often not available Although many of the processes for item analysis 
are rather technical and complicated the essential idea is easy to grasp 
The purpose is to determine the difficulty and the discriminating value of 
each item in the test Obviously an item missed b> everybody or answered 
correctly b} everybody who took the test is of no value in differentiating 
betw een good and poor pupils If the test is for the purpose of determining 
the extent to which the mimmura essentials of a unit or of a course have 
been mastered how ev er the difficulty of the individual items is relatively 
unimportant and the matter of discrimination is of minor sipificance But 
if the test IS to be used over several grades as a basis of classification or 


«J Wayi e Wnghtstoiip Measuring £)oine W«j tObjeru 
School Reinew 43 771 770 December 1945 also J 
FTpenn ental High School Practtces 194 pages New York 
Teachers College Col imhia Umversitv 1936 

•‘Fugenelt Smith Kalf.h « Tyler ai d staff lppri « 
/ regret 5a0 pages Y rlc tiarper af I Br>tbew 1 «J 


vfg of t) e So lal Studies 
Wrghtstone Appraisal of 
Uureau of Publications 

7 g arid neeording Slu tent 



118 


THE PROBLEM OF MEASUREMENT 


school marks, the discnimnatmg \ alue of the items is of major importance 
With the exception of a few easy items at the begmnmg of such a test for 
the purpose of building morale m the pupils taking it, the items should 
show a percentage of succes'Jes mcreasing progress^ ely from the poorest 
pupils to the best 

Only the simpler processes need concern the classroom teacher, for there 
IS considerable doubt whether the elaborate techmques are enough better 
than the simpler ones to justify the additional labor in\ oh ed 

One of the writers’® has de\nsed a method of item anahsis that is simple 
enough to be performed b\ a reasonablj conscientious high school student 
This procedure is explained fullj in Appendix B, pages 436-453, iihere a 
tjpical classroom test is analyzed The method can be outlined as follows 

1 Administer the test and score the papers, preferably putting a red X 
beside each incorrectlj ansnered or omitted question 

2 On the basis of each student’s total score, find the 27 per cent of the 
persons tested who scored highest (had the fewest X’s) Call this the “high” 
group Find the 27 per cent who scored lowest (had the most X’s) and call 
this the “low” group For instance, if there were 60 persons tested, the 
number in the high group would be 027 X 60 =* 16 2 = 16 The number 
in the low group would al«o be 16 This would Iea\e in the “middle” group 
GO — (16 + 16) = 28 papers to be put aside, for thej arc not needed in 
the item analysis 

3 Start with Item 1 on the te«t How manj persons m the low group 
mi««od It’ How manj persons m the high group’ Subtract the number of 
persons in the high group who misled the item from the number in the low 
group who missed it 

4 Repeat Step 3 for exerj question in the lest, each time determining 
the difference betw een the number of students m the low group w ho missed 
the item and the number in the high group who mi'i'ed it 

5 List the differences in ascending order, beginmng with the highest 
negali\ e one and going down to the largest positi\ e one, together with the 
Item s numtier in the test There should be as manj differences as there arc 
te«t questions The largest positive differences (near the bottom of the list) 
imhcalc the most di^cnminatmg items, while the small positi\c differences 
and tlie ncgati\ e differences suggest that tho^e items are not discnminating 
prcpcrlj and should therefore be looked o\er carcfullj for -vagueness, im- 
proper kejnng or unatlracluc options 

For the complete process, see Appendix B 


rlH^rf ^ aluc of Item anal, sib to the teacher is that it helps make liette- 
1 o’ I”!"-' 1 at the Amcnran Pm cl.olcraral iv'orut.o.i 


rotiTrt non in Ctiraj; , on 1 19il 



A SATISFACTORY MEISURING INSTRUMENT 119 

™=«=Peoled fla«s m .terns that would otheruuse 
probablj continue to appear .n these and later questions 

I rcqucntlj perhaps general!}, it a.U be found that the trouble is m the 
wording of the Hem, the language being vague, ambiguous or positive!} 
misleading In that ease a rewording of the item ma} be all that is nece^ 
sar} At tunes, hoiieier, the difficulty 13 more obscure and the item mat 
hate to be eliminated altogether Lindquist found that “adequately” and 
ndM-cn. ’ were equally difficult for eighth grade spellere but that the 
former discnmiriatwl 1.1 fat or of the good spellers and the latter m favor 
of the poor spellers Difficult} alone, therefore, is not a dependable measure 
of discrimination, for according to that eritenon both items are equally 
good Test cvpcrls hate usilall} found hoiieter that the average difficulty 
of the Items in a test IS related to the adequacy of the test as a n hole The 
rule suggested for the constniction of tests to discriminate best among all 
the members of a group is to make c\or 3 item of 50 per cent difBculty when 
corrected for chance,’* so far as possible This mil mean that virtually all 
items of (h-lo per cent and 85-100 per cent difficulty when corrected for 
chance mil be omitted from the revised form of the test unless the> can be 
reuntten to make them closer to the 50 per cent difficulty level 
Tests designed primarily for tnslruclionat purposes however, may at 
times be made much easier with good results 
Judging the vnlidit> of standard tests It is alwa^’s desirable to 
examine with some care the content of a standard test before deciding to 
use it Some of the earlier tests in particular contained senous errors 
Upton called attention to some of these in arithmetic tests and Dia 
mond‘* found 318 errors in 3 303 items making up the content of sixteen 
widely used tests in biology and general science Only one test was found 
to be entirely free from error A study of five tests of English usage revealed 
that from 16 to 55 per cent of the items called wrong were actually accept 
able according to standards published by the National Council of Teachers 
of English ’* Even when there arc no errors the items used often stress the 
relatively unimportant aspects of the subject 

The test manual also should be examined, because it frequently gives 
data on the validity of the test for example it should tell who made the 
test, how the items were chosen, what standardization and validation pro 
cedure was followed, and other pertinent information If the author does 


» Correcting test scores for guessing is discussed on page 156 Items are corrected 
’“^SSwd^^^Upto? of StandarS Tests on the Curriculum of 

“■MW cTkcmf On the Validity of SUndard«d Te.te rf Engheh U«e 
School and Soaeli/ 50 767 December 9 1939 



120 


TIE PROBLEM OF MEASUREMENT 


not gi\6 such information to the prospective users it is safe to assume thet 
the test IS of doubtful validity, for it is evident that the author does not 
attach as much importance to the matter as is desirable ^ While it is un- 
nece'sarj to ascribe improper motives to te^t authors and publishers, most 
of nhom are of a verj high t 3 T)e, it is important to recognize that they are 
nevertheless human, and it is reasonable to make some allowance for a 
little overenthusiasm about the ments of their own progeny ^^^lenever 
available, therefore, the results reported m the professional literature bj 
other users are hkelj to be especially valuable 
A sjstematic attempt to provide the information required bv the test 
u«er as a basis for an intelligent choice of tests has been made by Buros 
In a senes of Mental Measuremenls Yearbooks he plans to make available 
cntical evaluation of all recent tests bv one or more competent persons 
indepcndenth These publications vnll be found indispensable m selecting 
tests The revnew er is instructed to make the revnew s “frankly critical” and 
‘ to base the appraisals upon his owm cntena as to what constitutes a good 
test *' The reviewers are descnbed as follows 

In selecting reviewers an effort was made to choo«e persons representing a wvde 
varieU of positions and viewpoints among actuil and potential test users As a 
result a V erj heterogeneous group of rev lewers bav e coopemted m the preparation 
of this volume — classroom teachers citv cthool research workers, climcal psj- 
chologists curriculum specialists, guidance ciiecialists personnel workers, psv cholo- 
Ki«t& subject-matter specialists, and te«t technicians It can be trul> said 
th it the revnewers represent no one groun orccliool of thought, imless the reviewers 
are descnbed as representing all tc«t u«ers — actual and potential — who are con- 
Biderevl c«pccialK competent m their fields vnd who have the courage to «peak 
frankU and honcstlj in appraising a standard te^t 

Ideallv a reviewer of a standard te«t, such os a high school Latm test, ought to 
po-s'e-s* the qualifications of a curriculum and teacliing specialist, and a test tech- 
nician knfortuiutelv all of thc^ qualifications are rarelj found in anj one 
person The average quahtj of the reviews is likclj to be highest when the 


”The Amenoan P^vcli doucal Assoeution and other eimilar profe«««ional organiza- 
tio!!- have become roncerneil with IIm quilitv of left manuab and the methods of 
di^iribuiiiig te«te W ith regard to interest inventories perxonalit) inventone* projec- 
tive irutrumcnla and related chnieal techniques and tc«ta of aptitude or abllit>,” 6<‘0 
li-cliiiital 111 commendations for P-v cholocical Te-ts and Diienotitic luhntmie^ 

I * /rholoyical liuUeUn, 51 1-3S Man h 

V more Reneral source is “Inform ition Whitii SJiould lie Provi lol bv Ti“*t Ihibkdiers 
an 1 Te-iiriR \uenctes on the \ ahdit> and I «k. of Tlitir I e-ls jj. jun* read bv llcrl)ert 
w. C-onnd Paul L. Drewl and I,auranr< 1 Miifler i‘rorrrdin(;s ( f Uif ]J 4 D /niitalioruil 
(.onfcrmcc on r«'in? /VoWrntt, page* W «) Pnmelon Ntvv Jtrstv Ulucatioiinl To»t 
mg ‘'cr\ ICC I9.y) 

"O-enr K Hums Tht fourth Menial Mnuuremenla ^earbool Iligliland Park N*" 
Jcr»e> phon Pres'* 1053 1180 pages Thu jenrbook covers the jear* 19-J8 lliroufili 
I J .1 an 1 tl cr. fore nipplfTnrnts rather than supplants the older j carlK>ok-« 

I rl h 1 .cnphif^ and di'eus*iomi of te«ts durinR tl e p« nod AuRU«t 1 1910 ttimiij' 
JuvVl 10*2 I rHenrk B Davts (n<litor) ‘ l,.lurni, ,nal an 1 P-vchoWicil T.-t 
mg I rr ftr of F lunti onal Ftiearrh 23 1 110 1 tlirusrv 1053 



SA TISFACTORY HEASUIilNG INSTRUMENT m 

valid for all purposes, or for the same purpose iu all situations. Further- 
morc ( here ,s „o nay ol knmring when a new test may appear mth merits 
rn outstanding as to render obsolete earlier tests that have hitherto been 
entitled to high comparative ratings. But, all things considered, the best 
available sources of information arc probably the measurement specialists 
111 reputable colleges and universities, of which there are several in most 
statc.s. Their rccommcndation.H can usually be relied upon to be impartial 
and based upon a wider acquaintance with existing tests than the average 
teacher or school administrator is likely to have. But in the final analysfs, 
when all the cards are on the table, the teacher or administrator must rely 
Upon his o^\^l judgment. The necessary background for making such a 
judgnicnt intclligentlj’should be specifically provided for in the professional 
training of teachers. The data rcquire<l for such judgments should be made 
available by the test publishers and by such publications as the ^fental 
Measurmcnt9 Yearbooks. 


C. licliiibHity 


Afcaning of rcliafiility. hy rvliahiUty is meant the degree to ivfiich 
the test agrees with itself. To wlmt extent can two or more forms of the test 
he relied upon to give the same results, or the same test to give the same 
results when repeated? If the scores on the test are stable under these con- 
tfons, the test is said to be reliable. In a word, reUabih’i^ means consistency.-* 
The terms rcliabilUi/ and validily arc often confused, but there is a clear- 
cut distinction between them. Reliability, an such, has nothing to do with 
the trutlifulne.«s of the measurement, but is concerned only with its con- 
sistency, an entirely different thing. A homely illustration may help to 
clarify the distinction. A man returns from his vacation with a picturesque 
story of tlie fish he claims to have caught. As he meets friend after friend, 
there is always the same glowing account, even to the minutest detail. 
Now, in a statistical sense the slDjy is reliable, for it is certainly consistent 
Unfortunately, the fisherman’s veracity is not thereby established, for con- 
sistency by itself gives no assurance of truthfulness or vafidity- fn reality 
the story might be sheer fiction from beginning to end. 

Importance of reliability. Shakespeare said: “Consistency, thou art 
a jewel,” and he was right. But consistency is not the greatest jewei, 


« K. RnroB. The Nineteen Forty Mental MeoBmemmU Yearbook, pages 12^13- 

Highland Park, N. J.; Gryphon Press, 1941. ThnmHike -Reliab^ilt^ " 

« For a thorough discussion of this concept, see IvaBhiogtoo. 

in E. F. Lindquist (Editor), EducaHonat Meaeurernrni, pages 560 b.V. \sa.uu..i 
U. G.: Ainer jean Council on Education, 1951- 



122 


THL riWHLEM or MEASUREMENT 


whether in a tessl or eKcwhere Bj consistencj, or reliability, is a 

doubtful \nrtue, for a test as well as a person, might be consistentlj wTong 
but Its absence is a «ign of weakness Although high reliabilitv is no guar 
antee that the test is good low reliability does indicate that it is poor 
In the abo\e illustration it should be noted that had marked discrepancies 
occurred in the fisherman’s storj from time to time, considerable doubt 
would ha\e been cast upon his truthfulness Validity is alwajs the first 
oualitj to be sought in a test and, granted that reliability is a valuable 
auxiliary The ideal lesl tells the truth consistently 
There can be little question that test makers have giv en too much at- 
tention relatnelj, to determimng tlie rehabihty of tests, and too little to 
e‘dablishing their \ aliditj One reason for this, doubtless, is that the former 
is easier to determine Much harm has resulted, however, when uncritical 
u=ers ha\e naively assumed that reliability msures validity, a \iew which 
IS whollj erroneous 

"Methods of detennining reliability The term rehahdtty is purely a 
statistical concept Contrarj to what was found in the case of curncular 
vahdit\ , little can be told about the reliabihtj of a test from examining 
the test blank itself It is of course true that if a test can be objectively 
scorefl it IS more likely to be reliable than if the scoring is subjective, but 
the degree of reliability cannot be determined by that fact It is also true 
that a long test has a greater likelihood of being reliable than a short test 
but there are many exceptions In the last analysis, howe\er, somebody 
must try the test out to determine Us rehabtlUy Usually the author of the test 
docs lilts and reiiorts the results m the test manual If such is not the case 
one 11*18 a nght to be suspicious of the ments of the test 

Method with two test forms. Three rather distinct techniques are 
u«cd to e'*tabli®h the reliabihtj of a test The method commonly used by 
makers of standard tests is to prepare two or more parallel forms of the test, 
Mwl then Vo vbese eqvnvalcnV lonns o! the test to a largo number cA 
piipils usually with only a short mteraal between the testa The test is said 
to be rthable if then, is close ngricment between the scores on the two 
forms that is if the pupils who made high scores on the first test also make 
high scores on the second if those who made low scores on the first test 
again make low scores on llie “icond and so on for all ranks in between 
If the agrecinent is perfect as ts mo^t unlikely the correlation is 1 00 
On the other hand, if there is no coii’ji'jtent relationship tlie tocfhcicnt of 
rtlial ihtj IS 00 It wtU be recalled that \ahditN is also expressed us a co- 
tfnncnt of correlation whose maximum \alut is 1 00, and whose minimum 
value IS 00 But m the case of statistical valulily the agreoment is with an 
rxtrmaJ inlcnon wlicrcas in the case of reliability the agreement la with 
an tn/rmol entenon of some kind In the aliove i!Iu«tration tln*< internnJ 
entennn «-« another form of the same test, which presumably maisures the 
Mnu fum li HiH tbr first tejtl 



123 


.1 !^ATISrACWIiyMi:isU!llVG /A^sr/i(/MmT 


repeat the tet .t n later t.me and to determme the extend 

tnX’’ f "''““on I’otoecn the too xer.es scores 

Another procedure is to gne the test once only and then to rcco.d too 

for content and difficulty At hen the tho senes of scores are obtained the 
coemarnt ot correlation bclnecn them is computed Tins is the reliability 
of tlic half test The reliability of the nholc lest is then estimated by the 
use of a special formula * A similar formula also makes it possible to esti- 
mate the probable rehabibtj’ of the tost when jncrea‘?cd to any required 
lenpth, assuming that the items added are of the same type and quality 
as those in the original test 


Both methods for ohtainmg the reliability of a singleform of the test have 
been severely cnticizcd and as stoutly defended The testrretest method 
has certain senous limitations If the test is long, to av oid fatigue and bore- 
dom some time must elapse bctuccn the tno trials In the case of achieve 
ment tests, particularly, this delay is hkely to introduce other variables 
The pupils may discuss tlic test between tnals, do extra study, or do other 
things that may effect a change m the status of their knoivledge In addi- 
tion to this, their phy sical and mental conditions fluctuate from day to day, 
cv en from hour to hour For example, Ashbaugh“ found variability in one 
fourth of the pupils who were given the same spoiling test under highly 
constant conditions three times within fifteen minutes One would appre- 
ciate the difficulty m determining the reliability of a certain type of ther 
mometer by checking the readings made at one hour against those made 
later in the day Guilford thinks that “it is safe to say that the average 
tost scale of mental ability is fully as reliable, or probably more so, Chan the 
av erage clinical test in rncdieme, such as the test of blood pressure or the 
basal metahohsm test, whose rehahMy ranges from about 60 to 90 
But there is also the contrary tendency in human beings for errors made 
the first time to persist and to be repeated at later times In an extreme 
situation, where the pupils memorized the first senes of answers, the ap- 
parent reliability of the test w ould be perfect Indeed, this tendency to echo 
the original responses appears to be strong, for test-retest coefficients are 
usually higher than those arrived at by correlating halves or equivalent 
forms of the test Because the corre/ation of the half tests eliminates, or 


»« This 18 the Bimplest form of the Speaman-BTown ‘ prophecy or ' step-up 
V hir! appears m many test manuala the rehaMity of the ivhoJe eat equals tivice he 
J i;rttSLTbe”"lftP.Lco«. divided b, (I +«herb.l»'»iil»lttet»eo,e») being 

t , 2fn 

M ntten in (W wi>oi9 as r* = ^ 

i.Lme.t J dnhteiffb Virility ot Ch,ld«,„ Spelling SnW «nd Wg 

0 03 JS January 18 I'H'l ^ ^ eo >« Ms, lOJS 

r Guilfoni Intrlh^enceTe-ts fdwa/t<Ht 58 oiS \U^ I '•^8 



12-i 


the problem of measurement 


at any rate greatlj reduces, memory cany o\ er, it is recommended by some 

wnters - 

The split-half procedure, mvolving as it does the Spearman-Brown tor- 
mula, IS based upon certain a'^sumptions The half-tests should be of equal 
\anability, and the items m one half must be of the same quality as those 
in the other half It must be emphasized that the formula requires the use 
of matched hah es of the test, not ]ust anj halves ^ 

Kuder and Richardson’* ha\e de\nsed several methods of obtaimng a 
reliability coefficient vs hich make it unnecessary to split the test into hah es 
or calculate a coefficient of correlation Unfortunately, their most usable 
formula, called IHl Xo 20, is vcrj labonous for the classroom teacher to 


compute 

As Cronbach” aptly points out, the comparable-forms, split-half, and 
test-retest reliabiUtj coefficients get at different aspects of reliability The 
first is a “coefficient of eqmv alence and stability,” the second a “coefficient 
of equivalence” only, and the third a “coefficient of stability ” 

\ strong word of caution is needed Neither the split half method nor the 
tanous Kvder-Richardson formulas are applicable to speeded'* tests, for they 
o\ erestimate their reliabilitj A tcH is speeded when many of the examinees 
would hav e made better scores had they been giv en more time If nearly 
all persons did about as well m the time allowed as they could have done 
in a longer period, then the measuring instrument is a “power” test In 
practice, most timed tests involve both speed and power Watch for the 
rather frequent split-half or KR rcliabilit> coefficients reported m test 
manuals for speeded tests, they are deceptively high 
One of the vvntcrs has published a simplified computational technique 
for sccunng split-half reliability coefficients from unspeeded tests ^ Another 
shortened procedure is illustrated m Appendix B on pages 452-453 

Tlic inlcrprctation of test reliability. WTiat standard shall a test 
meet m order to be considered satisfactory from the standpoint of reli- 
ability^ No simple answer to this question is possible It depends, for one 
thing, upon the finenc«s of discrimination required Kelley ” has suggested 


*’ However CL-vrk has thown that the variation of split half r s from «ample to sample 
w much Rreater than the variation of rurh r« within a sample due to different meth^s 
of rpliUinr provided that the method of eplittine « longitudinal— puts items through 
out the te-t m each half rather than having one half consist of the first items and the 
other lialf of the last gee Ldsard L Clirk Methwls of Splitting vs Samples as 
‘Sources of In«UibilJt> m Teat Kehahilitv Coefficients' Unrmrd fducaltonal llertete 
10 178-IR2 Ms}, lOtO 

«G r Kuder and M ys Iliehard«on ‘TKe Theory of the R«timation of Test Helm 
lility rtjrhomrtnln 2 ISl-lTO Septemlier 1917 

** J Crontiafh F^tmlioU of PtrjtiuAoQxcal Tr»Un^ pages 05-73 New York 
Harper & Brothers lolo 

"Julian C \ <;,mphriwl Method for Estimating the Split-Half Rehability 

toelTrv’nt of a Test ilamrd hdufttltonal ltrv\rv,2\ 221 22J fall lOoI 
“Truman I\el’.-v op rtl fwgeji 2*^-^ ' 



^1 •S'lvvAr.icraRi' j//;,i,s(7;,vvr; ixsTRUurmr 125 

I tool'srafe eoefficont, of a s.ogla 

oU for “•“t"'' of “ emup in some subject or group of siihiects 

'm f^r f;'' odnetemenl of o group m t»o or more scholastic iine*J 

mhiccl J the sUtus of mdiiidiMls in the same subject or group of 

.9S for difTcrcntntuig ijiilnuluals in tno or more scholastic lines 


Tho lutcrpret-'itmri is also beset bi many other difficulties The coefficients 
not only reflect soincwlmt tlie methods employed m their computation, 
but niso the % urmbihty of the jn-oiips, the interval betii ecn tests, and other 
factors For c\amplo, a test of axerage difficulty for a typical group maj 
he much less rebahle «hon used mlh a markedly inferior or markedly 
superior group 

In \ icw of the fact that measures of reliability, no matter hoiv arnved at, 
arc influenced by factors other than the form and content of the test itself, 
It « ould npfiear that the x’alue of such measures has been overemphasized 
The same cnergj* dc\ oted to improvnng the validitj of the test would bnng 
better returns It is not likely that the average teacher will find it profitable 
to compute reliability coefficients for ordinary class tests, although it may 
sometimes be north nhile to do so for final examination* 


Objccihllx and rclinhilll}* By objectivity in a measuiiug instrument 
IS meant the degree to uhich equally competent users get the same results 
Ordinary measures of height and ncight, for example, arc objective, nhile 
estimates of beauty and integrity are subjective The distinction between 
objective measurement and subjective measurement is implied m the ques- 
tion: "Do marnwl men reaUi/ live longer than single men, or does it just 
seem longer^” As a rule, objectivity is very closely associated mth reli- 
ability. Tor this reason standartl tei.ts are usually more reliable than rating 
scales As a matter of fact, great impetus was imparted to the objective 
test movement b3'' the discoveiy that the major cause of the notorious un- 
rehabihtj’ of the ordinary school examination was its subjectivity of mark- 
ing The emphasis on objectivity has since gone so far, however, that many 
educational vvorkers seem to reganl "objectivity" as sjnonymous with 
“scientific method ” 1 o such persons any element of subjectivity m a study 
renders it hopelessly unscientific It may be well, therefore, to took caie~ 
fully at this all-important matter of objectivity 

I'o discover at the outset that there is no such thing as a xifiolly objective 
measure may be something of a shock The plain fact is that objectivity 
IS always relative, never absolute The measuixments obtuned by a yard- 


-Some writers would abaudou altogether the - blanket term T '"fE 

more sptniic estimate, ot absolute aud relaUve rureorat, of f 

larkson and George A Fergu-en page 25 rn Stodies on 

HMdm Ao 12 of tte Doparlmenl of fahonliW Brsiorct hmr.r.iU, of Icmu. 
Hliior Sirei t W e-t Toronto 5 Ontario Canada, ISMl 



126 


THE PROBLEM OF MEASUREMENT 


stick, for example, are only relatively objective, for one ould hardly e\pect 
a dozen different persons to get absolutely the same results in measunng 
the length of the playground They would probably agree to the nearest 
foot, and possibly to the nearest inch, but they would usually disagree 
markedly, if the results were expressed in some such small umt as hun- 
dredths of an inch And, of course, such units as inch, foot, and yard are not 
natural umts, like day and year, but umts set up by human judgment 
BrowTiell points out that there are ah\a>s many subjective factors m- 
\ ol\ ed e\ en w hen the test used is of the so-called obj ectiv e type He saj s ** 


Well, first of all, m the practical ctrcurastaii<%s of teaching one decides to giie a 
test The decision 13 surelj not based upon purely objectii e considerations Second 
one determines whether to make a test or to buy one Third, one makes up 
one s mmd regarding the kind of test — whether it is to be of the traditional 
of the newer tj’pea, or a combination — ^judgment agam Fourth one settles upon 
the scope of the test — judgment once more Fifth, one selects the items to be 
included httle objectmtj here Sixth, one chooses the form to be emploj ed — 
true-false, multiple choice or what not — again httle objectivity Seventh, one 
frames the items as carefullj as one can — and once more has onlj his judgment for 
guidance Eighth, one prepares a key by listing the correct answers — a judgment 
which may not be acceptable to other teachers even of the same subject Ismth, 
through opinion one defines the conditions of administering the test Tenth, one 
scores the papers— at last objeetivitj But, eleventh, one assigns marts— another 
increment of judgment, and a big one 


Brownell protests against what he regards as the overemphasis on ob- 
jecti^ty, which he thinks has unnecessanlj lessened the depth and nar- 
rowed the range of measurement \ safe position would appear to be to 
tnj to maU measurement as objectue as possible rjnthmit saenfiemg lalidily 
It must be remembered that the latter is always more important It is 
never going to be possible or desirable to eliminate certain basic assump- 
tions underljnng all attempts at evaluation Undoubtedly at times, how- 
cv cr, w e hnv e made assumpticms in measurement when w e should hav e had 
mdcnce Many test makers, for example, have assumed that one problem 
aelnalir.'l '’‘“E"”""’ "■ “nlhmetic IITien the matter was 

thlronenr iT r "" '=‘"*08, « both found 

amelv ""d "valid, owing 

iri" “ 

f U^ionol In-tniction.' 

sullen ol FnrtioS'^ b’'’'"’ Dumowv ot Piror m Multipli 

I> W. ...h . 1,"^“ Critvin Tj P-. of , e 

'-l-TOl,, lox Jmrrad e! I pu ,rah n 1 7 10 



A SATISFACTORY MEASURING INSTRUMENT 127 

of standardized tests must be laught and not merely told how to do it “ 
A senous cITort should of course be made to eliminate all needless types of 
aubjcctiMtj A guess is usually a poor substitute for actual knortdge 
D Usabihij 


lenniiiR of usabilitv There is qmte general agreement among au 
Ihontics in measurement that the tno most important characteristics of a 
mcasuringin&tmmcnt arc \ahdit> and reliability Both have to do AMth the 
theoretical accuracy with which the instrument measures However there 
are certain other considerations of a practical character which must be 
taken into account In the judgment of the writers all of these may be con 
vcmentlj designated bj the single term usability By this is meant the 
degree to which the test or other instruincnt can be successfully employed 
bj ci'i'Jsroom teachers and schoof administrators without an undue expend 
itureof timeandcnci^ — inaword usabtlUy means praciicabihiy Ameas 
urmg instrument must not only be valid and reliable but also usable This 
viewpoint IS well expressed in The Methodolgy of Educational Research " 


But wc must aUiajs temporize ideals with practical considerations Perhaps an 
ideal instrument would bo «o cumbersome and expensive of effort and tune that 
its use would not be warranted 


^^liothcr or not a test is usable by average teachers in service and other 
persons whose technical training in measurement has been limited depends 
upon several (actors, of which the following are probably the most im 
portant 


1 Ease of administration 

2 Ease of scoring 

3 Ease of interpretation and application 

4 Low cost 

5 Proper mechanical make-up 


Each of these factors will now receive brief consideration 

Ease of administration Group tests, as a rule are much easier to 
administer than indiudual tests The Stanford Binet is a good eiample of 
a test nhose validity and reliability are high but whose usability is Ion 
largely because of complicated instructionn for giving and scoring Special 
training in a college course for one semester is usually suggested as the 
minimum required for mastery of these mslruetions Kven then the test 

makes heavy demands upon the examiners tune „ ^ 

There are of course two types of instructions for a test One has to ao 

ErrearrT page 439 New York D Appleton Oenlnry Company 1936 



128 


THE PROBLEM OF MEASUREMENT 


directions to the examiner, and the other has to do with direction*? 
to the pupil or pupils But, m general, the requirements are the same for 
both The motto of a certain ne\\s weekly indicates T\hat is required the 
directions should be “clear, curt, complete ” \\Tiether or not example*? 
fore-exercises, and the like are necessarj \%nll depend mainly upon the age 
and experience of the group being examined ^Tiether or not a group test 
is easj to admimster depends to a considerable extent upon the complete- 
ness of the manual Some tests ha\ e no time limits, many hav e generous 
time limits, while still others are broken up into intervals as short as 3, 5, 
8 10, or 15 seconds These short intervals are difficult to observe with a 
stop watch and well-nigh impossible without it On the other hand, tests 
of the so-called self-admmistcnng tj^pe inv olv e only one short set of direc- 
tions for the entire test Most tests, howev er, are broken up into separate 
sections each of which has its own directions and time limit In determining 
how difficult a test is going to be to admimster, a careful examination must 
be made both of the manual and of the test blank itself 
Ease of scoring. The ease of scoring a test depends pnraanly upon 
three things objectmtj, adequate kejs, and full scoring directions The 
better standard tests rank high on all throe counts Scoring is also facili- 
tated when the pupil has been instructed to record his answers in a straight 
column rather than irregularlj over the page, and in the form of a numeral 
or single word rather than a phrase or longer statement As a rule, all ac- 
ceptable answers should appear on the ke> With the exception of scales 
in which score values of unequal weight are required, each correct item 
should count the same as anj other correct item m the test The umqual 
weighting of items, «o common in the earlier tests, has been found to add 
to lilt difficult} of scoring without a lorrcspoiiduig increase in valuht} or 
relialnlii} 

Tlircc waj's to spee?! up (he scoring of objective tests are (I) liaiid- 
Fcorable separate answtr sheets (2) machme-seorable bcparati an*>wc’ 
fhc*els, and (3) scif-vconng answer sheets Traxler sajs ^ 


Tlie gn.wang tendenev to cmplm «c|nr»tc answer hhect« y? perhaps tlic mo-t 
pronounced single tren.l in ohjeetue testing It unqui^tioiublv has a restrittne 
for it tends to forte te^t items into a singe-re*<iK)n-c 
I r question Tlie u«e of other t%Txs of qiiertions us not 

er^V!r'M'’'’ ingenintv in v?ttinR up their 

olT n than are tv p, call} emplojed m dcMsmg 

a tc-t owl to l*u«ed when Fcttmg up Fcpamtc answer forms 

Uipre vs not, at prc'ent «nouRh cxpemneiitd eMdriice to warrant n dofiiiiO' 

. 1.1 -V,™ t ' r rjj "■ ? tlial ai.llinr. l,a> c an inciapal Ic 

^ T1 s^-vin ju.tif ration f,.r U c ii-s* c f «« I«rut an.w. r si e^ts „ U, the> arr iie 


' \rtlurl TrwCr op nt {ngr It 



129 


A SAT/SFACTO/iV MVASUR/A^G INSTRUMENT 


c^n'toHj b^bronyh^faS^r^ruJeS^' 


Concormng machme Bconng Tracer is less optimistic, pomtmg out that 
Cl cn for 1 crj large testing programs “jrachine scoring maj not be 
fiwtcr or cheaper than manual sconng of a prescribed uniform pro-ram 
11 here all manual sconng procedures can be highly routimred ”« Further- 
more, unless carcfu]I> supenised, machine sconng may be less accurate 
than indcncndcntl.i checked hand scoring The chief adiantage of machine 
sconng seems to be in those situations where the maclune is kept going nJJ 
long and is operated b> a full-time highly skilled person 
iMan> scif-sconng de\ ices are available ” T>pical of these are “hat-pm" 
punching mctliods utilized in the Kuder Preference Eecord” and the Ohio 
State Vnnersili Ps^ chological Test/' flie concealed carbon of the Clapp- 
Young self-marking tests'* and Scoreze ** and the ‘punchboard” procedure 
of the Science Research A‘ 5 sociatcs sclf-scorer,” where the student piinche., 
until he finalli obtains the correct answer All of these probably have con- 
siderable merit as instructional and motivational devices, but if scores are 
desired for cvaluational purpo«es they can at the present time usually be 
secured more ea^il^ and dependably by other methods, cveept perhaps for 
the Kuder Preference Record 

Ease of interpretation and application. Whether or not the results 
of a test are easy to interpret and apply depends primarily upon the ade- 
quacy of the manual accompanying the test In the first place, the manual 
should contain complete norms to facilitate interpretation Whenever pos 
sible all der» ed scores should be capable of being read directly from tables 
of norms without the necessity of computation The norms should, as a rule, 
bo based both on age and on grade, and, m the case of high school achieve- 
ment tests, on the length of time the subject has been studied It is also 
desirable that achievement tests be provided with separate norms for urban 
and rural pupils, and for pupils of various degrees of mentality Up to the 
present time very few tests arc adequately provided with norms for inter 
pretation Where the pnmary emphasis is upon diagnosis and other in- 
structional values of tests, this loss is not very great In any event, it will 
usually be necessary to rely heawJy upon local norms « 


L P^sJe} '^DevSment and Appraisal of Devices Frovidmg Immedmte 
Autotlt rf ObfoSlve T»l. .nd Co„co».U=t Self In-truetion of 

Psychology 29 417 447 April 1950 

::sSs;gra“o<S“okiosui.u„™^ 

« Published by Houghton Mifflm Company 
•» Published by California Test Bureau 

« Published by Science Research Associatca 

» A fuller discussion of norms will aopear m Chapter 10 



130 


THE PROBLEM OF MEASUREMENT 


Se\eral of the better manuals give specihc suggestions regarding the use 
to be made of the test results « Supplying some suggestions as to results is 
a valuable service for uhich it is hoped test publishers in the future mil 
accept more responsibilitj For many uses it is necessarj to have fit least 
tuo forms of the test equated both as to content and as to difficultj 
throughout the full range of scores, and not for averages only Few tests 
meet this requirement fully 

Cost. With the exception of certam laboratory apparatus and equip- 
ment for measuring special abilities and disabihties, testing matenals are 
usuallj not \erj expensive Few achiexement tests co\enng a single sub- 
]ect, or group testa of general intelligence, cost more than ten cents each 
Batteries cox enng sex eral subjects when printed as a single booklet usuallx 
cost from ten to fifteen cents For a comprehensive testing program the 
general battery mil cost le's than separate tests covering the same subj ects 
Cost IS a practical consideration m most school systems, and there is no 
point in paying more for tests than necessary 
WTule it may be true in general, as commonly held, that in the long run 
one gets about nhat he pays for, there are too man> exceptions to make it 
a safe rule In statistical terminology the correlation between the cost of a 
test and its worth is positive, but too low for accurate prediction Here, 
as elsewhere, the customer should be wary, lest he not gU his monej 's 
XX orth In a test, as in an automobile, the quality is often not evident on the 
surface The prospective purchaser should not make co«t a primary con 
sideration, for good tests are often no more expensive than poor ones 
Fortunately, therefore, relative cost can be considered a minor matter, as 
a rule, and the choice of the test can rest, as it should, upon its validity and 
rehabihtj for the purpose it is to servo It must be remembered that one 
test maj be cheap enough at fifteen cents and another too costl> at fixe 
Utcr all the careful purchaser is more concemetl mth xihat he gets for his 
moncx than with xxhat he has to pa> 

Mechanical makc>iip of the lest Tests issued by the larger publishers 
are almost ahxajs printed in clear Ix'pe of a size appropriate to the grade 
level for which thej arc intended But there are some exceptions One of 
the leading publishers issued a test in which the kej word in each sentence 
was supposed lobe in bold faced Ijpc, but m man> cases the poor qualitv 
Ivpe did not clearlj indicate winch word x\as intended On a timed test 
^uch as this one, a handicap was impo'='ed upon all pupils except those with 
the lecnest xision In the lower grades careful attention should be gixen 
to the (luahty of pictures and illustrations used In the earlier da>s it was 


. “ Certnide n IliUrrlh and Harold H haVr Manual for Inter 

Nea\ork WorlJHook 

il S? tu 'y ** ‘<=-1 order* howiver 



A SATW Acronr Mi’AbOllltVG INSTRUMENT 13 J 

. omm<m to lm% c 0,c instmrt.om. to thoexaminer appeor on the teat booklet 
the Immla of the pupil This practice not only meant a needless cost m 
p iper and printing, but was possibly a source of confusion to the pupil 
Commcrnal publishers of tests haic not as yet given sufficient attention 
fo «p\ ismg tests \\hich ^\^!{ reduce to the minimum their cost m time and 
iiiouoy There appears to be no valid educational reason nhy most tests 
“houJil not be designed i\ith separate answer sheets, a practice nhich not 
oriK eliminates the economic ^aste of using the test one time only and then 
di'cardiiig it, but i\hich also grcatl> facilitates the scoring and makes the 
pupil’s test profile available m conicment form for U‘-e and filing Little is 
g'lincil, Iiowc\cr, nhen the answer sheet itself is sold at prices almost as 
Ingh as the test itself, as is now often the case The customer usually gets 
Mint ho wants when he wants it badly enough and makes his mshess known 
To make con\cnicnt, inexpensue tests feasible, moreover, the demand 
must be sufTiciontli increased so that the additional volume sold compen- 
sates for reducc'd profit per unit 

Siunmarj. Wliat, then, arc (he oannarks of a good measuring instru- 
ment'’ In bnirf, a good test or other roeflsunng instrument possesses three 
outstanding f|ualUics vahditj, reliability, and usability In other words, 
a good measures u/iai U ciatms fo, f<mststcn%, and wtik a minimum er- 
ptm/durc 0 / time, cnrrgo nud monm/ But alwa>s the first consideration is 
\iiJidity liic test must not oiiI> measure what it purports to, but in tlie 
case of achicv ement tests, it should purport to measure the really important 
mitcomcs of the odiuational process No achievement test that fails to do 
fins ran bo considered a satisfactory measuring instrument, whether made 
bv the classroom teacher or purchased from a publisher of standard tests 


n. Sonic Gcncralizalioiifi Regarding tlie Problem of 
Measurement 

The role of measurement m science id general and in education in par- 
ticular nas set forth in tho firet chaptw, the historical development of 
measurement in education uas traced m the second chapter, quantitative 
concepts nere discussed m the third chapter, and the characteristics of a 
satisfactory measuring instrument -nere desenbed m the present chapter 
In the light of these discussions a fen important generalizations mil non 
be attempted 


1 Some kind of meaenrmenl or evabmUon to inmlahle m educcUonJhm 
generalisation ,s amply supported by the Instory of every recogmzrf sci- 
ence, and of education itself, regardless of whether it is to be classified 

10 error llns rs true of the si^ealled 'Wet 
scLes- "to a'^rcater degree it m true of the less exact or newer social 



132 


THE PROBLEM OF MEiSVREMENT 


sciences, such as psj chology and education Westa^ay, for example, think- 
ing mainly oi physics and chemistry, concludes “We may, in fact, look 
upon the existence of error m all measurements as the normal state of 
things Kelley speaks of the “ubiquitous probable error’’*® m psychology 
and education These errors can be reduced but never u holly eliminated 
3 These errors of measurement are due tn part to the imperfection in the 
measuTvn^ tnslruTJienls aiodabJe There ate no perfect measuring instru- 
ments, eien m the physical sciences Westaway, for example, says that 
“even the i ery best of the instruments \nth nhich n e perform our measure- 
ments are imperfect This is true of the fundamental units of measure- 
ment in the physical sciences, as nell as of the biological and social sciences 
No astronomer knov.s precisely the \elocity of light, and yet the light year 
IS the yardstick of celestial measurement, no chemist knows the precise 
\alue of a single atomic weight, and yet it is the basic umt in chemical 
analj sis In psychology and education these imperfections are an e\ en more 
potent source of error than m the older sciences However, it must be 
remembered that the tools of measurement are much better than they used 
to be 

4 The UTTuiohons of Ihc methods used are a still more important source of 
error tn measurement Again this difficulty is true of the physical sciences 
as well as of the social sciences For example, Max Planck says that in 
pb>sics “e\ery measurement, howe\er exact, inevitably mvolves certain 
errors of obsenation These errors are due partly to sensory and tem- 
peramental defects, and partlj to lack of skill m the obserxer But a still 
more troublesome source of error is the tendency for the act of obsen- ation 
to interfere with the phenomena being observed in measurement Heisen- 
berg, for example, noted that the “measurement of an electron’s \ elocity 
is inaccurate in proportion as the measurement of its position m space is 
accurate, and vice versa,’’” owing to the disturbing influence of the light 
ra^a fallux^ on ih m. thn act of Ftesw d-iswvijTy Teshlted 

the famous uncertainty principle’’ or the “principle of indeterminacy,’’ 
which has profoundly influenced modem physics “As a matter of fact 
c\ery measurement,’’ says Planck, “whatever the method of its employ- 
ment, invariably interferes more or less with the event to be measured 
But this interference is so slight as to be only of theoretical interest to the 
labomtorj pt.jsic.st engaged in the studj of aggregates of etements instead 
of ind.ud uat ctcctrons And the ordinarj Xeii toman princptes of chem- 


"F t\ Sctnl.fcVM "• rhxhmph,ca!Jlaa„ and lU Mala of 

calion 2S^200 Newyotk Hillman Curl Inc, 1917 ^ 

Kc\le> op of pace 19 
•• I V\ W e«lairaj op nl page 2SC 

^orloni 

** ll/iil pagp C7 
” /fcvl pagprs 



^1 .11 7 isrAcroitY MLAsvmxG ixstrvmext 133 

r - "■' ■" --- - 

c<h^ 3 „'„’,T 3^”"'''"'’’ f procoss ,s more eenorrsm 

Pluradon 1 lie personality of the examtiier, as well a., the testing matcnals 

1,. I ""V situation This is recoginzeti in giring indindual 

<'s'enli''il'r' " "™’’T <='»"«"“ ai"! sulrjeet is regaried as 

ssenli rl toaMieecss ulevam, natron -But here, erenailh si, He<J examiners 
tire faelor is rarely chmirratcd altogether, for rt has her a found that the 10 
remains more stalile when the same e\aminer gj\es al) the fc< 5 ts In aJJ e\- 
ponmi nts. n},etlier mvolung the »sc of individual or croup tcsfs, the sub- 
jeets are not purely naive and receptive creatures luit are actuated by 
motive-? of pride, desire to please or make a goorJ impression on the ev- 
aminer, and the like In other words, the examiner or experimenter is an 
important part of the situation, and it is doubtful whether standardized 
mstnictionv ran ever reduce this part to the point at which it is negligible 
rite factor is c«pc( lallj' imporfanl in personality measurement and in the 
evaluation of &oeial hdiav lor Certainly one would hardly expect to get ns 
normal rcxivtion^ of Jove-making in thcpstchologiral laboratory during the 
day tt-? lie would if he were conccalcti in a tree beside a bench m the park 
during the evening 

o 7'rocAcrs and school <rf//Minis/r(i/or« must nol onh/ understand and ap~ 
prceintc t/ic /unctions of nicasuroncnt tn educalwn, but they must realize more 
fulhj the UmtlaUons of presnU measuring tnslrumcnls In the present state 
v)f measurement two erroneous attitudes arc sometimes found The first is 
t hat held by certain over-cnthusiastjc supporters of measurement, w ho make 
unreasonable claims for cvistiiig measuring instruments, and who gloss ov er 
or refuse to recognize tlic imperfections that e\i«:t ^ This attitude is not 
unlike tlint of tlie adolescent in his first love affair, where, indeed, if love 
IS not actually blind, it deliberatclv closes its eyes, and m any event the 
result iS the same This point of vje« is wi/orfunafe and uninfelhgent, for 
it stands in the nay of progress toward needed improvements, making such 
unwarrantcil fhims is the surest way' to discredit the movement vnth 
thoughtful people “ rortuiiately, this attitude appears to be on the de- 
cline ” 

“ For a stimulating (li>-ru?‘=ion of the pmrticat implicationB of this uncerfaintj pnn 
ciple by a distingm-hed Ammcan chemwl ae Imng Langmuir. Science, Common 
Sense, and Dec. nc^ ’’ Aarnw 43 M, 12 lo Jiouary 2 , , , 

Dlinor L Sat ks, “Intelligence Scores as a Function of 
Social HehtJon^hips Between Child and Examiner, /aumof o/ Adriormat and ^ac^al 
rsychdogy, 47 354-358, Apnl Supplement I0o2 

«For a PU-’Kestive anabdical di«cu-M(w of the profatem see Douglas h bcaie , 

-D, Lences Bel» ten Menturement Cntena of Pure feertutr and of Ckseroon. Tea 

nc," Journal of Eiumtwml Resmreh ^ I-I! Sep.enfcr 1W3 , „|j, 

.::rs/eotii:rffir,d^^ 

pie. for ean.tv ,n „t,„. U.,. In .,o p,on«r, r„n„r„,.,on are 



134 


THE PROBLEM OF MEASUREMENT 


But a second and equally erroneous attitude goes to the opposite ex- 
treme It charactenzes those isho are as blind to the virtues of existing 
measunng instruments as the first group are to their limitations, and nho 
refuse to have anything at all to do xath tests and examinatioi^ until all 
defects are fore\ er removed This attitude is as unreasonable as that of the 
farmer who has postponed buying an automobile “till them blamed things 
IS perfected,” and who has in the meantime worn out a great deal of shoe 
leather without seeing much of the world either 

Then there is the third attitude that of the practical person who has 
learned through experience not to expect perfection Moreover, he has 
found that excellent work can often be turned out wuth imperfect tools, 
if only thej are used with sufficient skill He has also discovered that 
greater skill is called for than if the instruments w ere perfect, and he sets 
out dehberately to attain the skill needed He realizes that the \ ery exist- 
ence of these imperfections imposes a special obligation upon the user to 
seek to understand as fully as possible their nature in order to get desired 
results in spite of them Furthermore, he makes a conscious effort m inter- 
preting and using test results in order to take mto account the existence of 
errors In other words, he takes the very common-sense pomt of view that 
the proper thing to be done under the circumstances is to make the best 
possible u«e of such tools as exist, while waiting for better ones to be 
developed 

Selected References for Further Reaping 


Bennett, George K , Seashore, Harold G , and Wesman, Alexander G , Differential 
ApWudcTMl«3/anual (Second Edilionl New York ihe Rsj chological Corpora- 
tion, Chapter 4, “Vahditj,” and Chapter 5, “Rehabihtj ” 

Brownell William A (Chairman), “The Measurement of Understanding,” Forty- 
F iflh 1 earhoah of the \ alwnal Society for (he Stud^ of Educatv>n, Part 1 Chicago 
Lnuer«!it> of Chicago Press, 19J6 338 pages 
Dai is Fredenck B , “Item Anal\ sis m Relation to Educational and Ps> chological 
Testing,” Psychological Bulletin, 49 97-121, March, 1952 
Dns=el Paul L , ‘ Problems of E\aluation m General Education,” Proceedings of 
lA« lOol Jnntol«moI Conference on Testing ProUems, pages 45-57 Pnneeton, 
New Jereej Educational Testing Service, 1952 
Duno« Philip II , ‘‘Achiei ement Tests m Penimps! Selertiop,” American Journal 
cj PMk Health il 507-572, Maj, 1931 


Pniiisiton, New Jersej Educational Testing 

^ (Chniraian) ‘ Constructinj Eiammations So That Thej It ill 
roncUoos " Proceed, ms of the WiS Irunlalurnd 
TMme^mZ iS ’’"’'“‘o" NewJeruij Educational 

Te^lmg gerMce Paoere bj Oscar K Buros Max D Encelhart Warren G 
Harold GulhkFon Char les R. Langmuir, and Phillip^J Rulon 

'T S''-"' 



A SATISFACTOny ilFASURWa INSTRUMENT 135 

’'"iTm^T’TIoU "*■ P^vdiohgicd Tahng Neff YorL 

Clmicnl' •’ Corapmj , 1950 Chapter 1, “Bar,c Principles Theoretical and 

>" eno EiuMwn (Second 

1.,^^ ti of M B«>'‘Con'P=">. 1050 Chanter 17, “Reha 

l.ihh of Measuremenfs,” and Chapter 18, 'Talidit, of iMcasuremcnfs " 

Iimiqiiist Ip r (rditor) Educahomnieasurement Washington D C American 
Council on L<lncation,lJI')l ClnptcrS Preliminan Considerations in Obiectire 
lestConstriiction, b> i: V LmdquM Chapters/ Item Selection Techniques ’’ 
t I ^ ^ D'ims, Chapter 13, “The Essi\ T\pe of Examination,” by 

Joim M a-tilmhcr, CInpter U “The Fundamental Nature of Measurement, ’ 
b\ In mg I^rge, Chipter 15, "Rchabiht^,” bj Robert L Thorndike and 
Chnutcr IG, “Validit\,” Edward E Cureton 
Lindquist, E I , “Some Criteria of an Ellcctive High School Testing Program,” 
pages 17-33 in Arthur L Traxler (Editorl, “Measurement and Evaluation m the 
Improvement of Education," Atwcncon Counoi on Education Studies, Senes I, 
No 4G, Vol XV, April, 1051 Washington, D C American Council on Edu- 
cation 

Odell, C W , Itoir to Improvt Classroom Testing Dubuque low a IViIham C Brown 
Company, 1953 Chapter I, “Introduction ” and Chapter 11, “Objectives ” 
Tliomdike, Robert L , Personnel Selection Test and Measurement Techniques New 
York John W ilcj i Sons, 1949 Chapters 4, 5, and 8 “The Estimation of Test 
Reliabilitv,” “The Estimation of Test Vabdit> Criteria of Proficiency,” and 
“Item Analjsis and Selection of Items ” 

Tliomdike, Robert L (Chairman), “Criteria for the Ev'aluation of Achievement 
Tests,” Proceedings of the JOoO Inutatxonal Conference on Testing Problems, 
pages 73-112 Pnnccton, New Jersey Educational Testing Service, lOSl Papers 
by John B Carroll, Frederick B Davis, Harold Gulhksen, and Joseph J Schwab 
Thorndike, Robert L , “Tests as Research Instruments,” Renew of Educational 
Research, 21 450-402, December, 1951 

Travers, Robert M Vi\ How to Make Achtaemeni Tests New York TheOdvssey 
Pre«s, 1950 Chapter 1, “Introduction ” 

Traxler, Arthur E , Jacobs, Robert, Selover, Margaret, and Townsend, Agatha, 
/ntrediidion to Testing and the Use of Test Results tn Public Schools New York 
Harper A Brothers, 1953 Chapter 1, “A Point of Departure ' and Chapter 2, 
“Wliat Do Tests Contribute to Understandmg the Indivadual PupiP 
Yemon, ThihpL ,TheSlructureofffumanAb%hites New York John TTdej A Sons 
1950 ’ Chapter IV, “Analj ses of Educational Attainments ” 
ffeitzman, Ell«, and McNamara, Walter J ConMng f 

—a Omde Jar Teachers Chicago Science Research Associates, 1949 Chapter 1, 
“Basic Aspects of Achievement Tests " 



PART n 


The Construction 

of 

Teacher-Made Tests 



5 

General Principles of Test Construction 


Iniporl;uicc of iht prohlcm. I here are at /east three reasons wfav the 
dcNcJopmojit of proficiency m constructing informal teacher made tests is 
important In tlic first place, the \ast majority of tests in use by classroom 
Icacliers arc of tins type * In the second place, both essay examinations 
made and marked by untrained teachers and objective tests used by ordi 
nary classroom teachers may produce highly unsatisfactoiy results Ihe 
t\tensue literature on essay examinations, briefly summarized m Chap- 
ter 2, has repeatedly demonstrated this fact Amateurs may at times do 
eten worse with objective tests thin with essay examinations Incredible 
as it may be, it docs seem possible to make objective tests of lower reli 
ability tlian e<5say examinations* In the third place, both logical consid 
erations and statistical analyses indicate that skillfully prepared informal 
tests arc as reliable and as valid as some standardized tests In fact where 
the teaching conditions are unusual, or where the subject matter is not 
thoroughly stabilized, as in civics and modem history such tests may be 
even more valid A state-wade survey of high school achievement conducted 
m Tennessee,* for example, showed that only 56 per cent of the questions 
in the standardized social studies test then m use could be answered from 
the state-adopted textbook 

This chapter will consider the general principles of constructing informal 


‘ Iveii Hensley and Robert A Davia What ftigh School Teachera Think and Do 
about Their Examinations hdttca nmol Ad rutmlrahon and SupsTwnon 38 2I&-228 

^^Johi)\1 Stalnaker The Tb-say Type of Pxammation Chapter IS in E F T md 
qutsl (LcJitor) Educational Measuremenl Washington D C Amencan Council on 
Education 1951 _ „„ kt 

»Jo» t Avtnt Peporl 0 / Ihe Tennessee Slalo TesUnv Program page 83 Na'hwUc 
Stale Department of Education 1946 


139 



wo THE CONSTRUCTION OF TEACHER-MADE TESTS 

teacher made tests grouped under the follovnng four headings, rvhich mdi 
cate roughly the steps or stages in the process 

1 Planning the test 

2 Preparing the test 

3 Trjnng out the test 

4 Evaluating the test 


A Planning the Test 

It should be recognized at the outset that the construction of satisfactory 
measuring instruments is one of the most difficult duties the teachpr has 
to perform Good tests do not just happen Isor are they the result of a fen 
moments of high inspiration or exaltation On the contrary the process is 
calm, dehberate, and time-consuming Perhaps the best that can be hoped 
for under existing conditions is that the teacher prepare reasonably com- 
prehensive and adequate informal tests m one subject each year Best 
results vaW usually be obtained from cooperatne effort The procedure 
employed in developing the Cooperative Achievement Tests, outlined on 
page 112, IS a good illustration Another example is the cooperative plan 
used by 19 colleges in constructing examinations * A third illustration is the 
procedure followed by the Evaluation Staff of the Eight-Year Study spon- 
sored by the Progressiv e Education Association ^ 1 he six major steps m the 
process as set forth in detail by Smith and Tyler* may be summarized 
bnefly as follows 

1 The faculty of each «chool was asked to formulate a careful statement of its 
educational objectives 

2 Statements from thc^ thirty schools were classifi-'d b\ the Evaluation Staff 
into ten major types of objectives 

3 r ach type of objectu e w as then defined in terms of expected pupil behavnor 

4 biluations were suggested in which pupils could be expected to show the 
particular kind of behav lor 

5 Tlie more promising methods of obtaining evidence regarding each tvpe of 
objective were then selected from exiting techniques or devired by the staff, and 
subjected to evjvenmentai trial 

G Tlie methods which made the be*;t “bowing in this prelinimarj trial were 
further dc\ela;>ed and improved 

7 Means were dcvi-ed for the interpretation and effective u“e of the various 
instruments of evaluation 


It IS rtcoRnizcd that the process just desenbed is too elaborate for the 

ordmanjthool or for the individual teacher uorking on his oivn Houeter, 


1 in Gincoil Education Proad.ng, 0 } lit 

L,'catriT«nrbr“c7or"'‘”''''^ 

-nderthcen 

Sll ‘,T/’ coo, .‘:T i’c,’u 'V' V na™"'! Kif cf 



GLNLRAL PRINCIPLFS OP TEST CONSTRUCTION 141 


it cannot be emphasized too sfrongU that the actual process of test con- 
struction must be preceded bj careful planning if the test is to succeed 
1 ho test u ill be no better than the quality of the thinking that goes into it 
In planning the test, consideration must be given to the nature of the ob- 
jecti\c to be measured, the purpose it is to serve, and the conditions under 
which It will be used 

1 Ademialc pronswn s) ouUl be made for aaluating all the important out 
comes of ,ostn,cl,on A careful statement of the philosophy of the school 
and the objectites of the particular course should be availab e from the 
start A sura cj ' of a representatu e sample of 1060 state county and c y 
coup-ea of stinh repealed that only 13 per cent contained no statement of 

terms must be broken don n and statM in usable ^ 

With the hst of teaching objectnes for the eourse 
stated, the teacher is ready to attainment of each 

propnate macl^^^ attempts to test uhat he has tried 

objcctue In other uoru , . , j each objective 

‘"r triter":^grct& objectives of instruction may he grouped 
into eight major categories 

1 Functional information 

0 Various aspects of thinking 

Cmst? aims P-n>o- 
6 iocnl adjustroearand social sensitivity 

n—ng social philosophy 

Another classification’ contains ten major types 

1 The develop^"' study sUlls 

3 Se mcurcation of ’““j 5 significant interests 

a Tnva aifnmsition of a " ^„_roM5ition of inu''ic 


a ?ird:::rme5 Sll^SS^eocia. adinsbnent 
^3 S^SSrof important in— 

B E laiary Office 

Eviluating the Prograiu 


E Leary A ,[ed States Office 

"“.'Sh 



142 THE CONSTRUCTION OF TEACHER-MADE TESTS 


9 The de\ elopment of phj sical health 
10 The de^ elopment of a consistent Dhilo>oph> of life 

For an\ gl^ en course the^e objectives must be e'^re^'sed in terms of the 
specific changes in pupils which the teacher is seekmg to brmg about 
A rather detailed inventorj of the particular facts, pnnciples, concepts 
and skills of the course is reqmred, as well as the specific mental processes 
the pupil IS expected to employ To measure whether these processes are 
reallj functiomng, the teacher’s inventory just mentioned must be pre- 
sented to the pupils m language different from that of the text and class 
discussion, and opportunities must be offered to applj or to relate the 
objective* to nevi problems and situations The center of gravntj is the 
behavnor of pupils rather than subject matter The teacher must not con- 
fute ends and means The true relationship has been stated as follows 
“The real ends of instruction are the lasting concepts, attitudes, skills, 
abilities and habits of thought, and the improved judgment or sense of 
values acqmred, the detailed materials of instruction-— the specific factual 
content — arc to a large extent onlj a means toward these ends ” “ 

group of English teachers, for example, were able to recogmze seven 
different aspects of “appreciation of literature ” Thej then suggested the 
following vvavs in which thc'e aspects of appreciation maj manifest them- 
selves m pupil behavior “ 


\ Saiitfadton in the Thing ApprenaUd Appreciation manifests it«elf m a feeling 
on the part ofthe indmdual, m keen satufaction in, and enthusiasm for the thing 
appreewted The person who realK appreciates a given piece of literature find* m 
It an immediate persuteut, and casiK rencnable enjovment of extraordman 
inten«itv 

Yr' “■(J*'/'"'!' }vprmal,d Apprcmtion mnnifests itseU m an 
w'rYiYlTn ll"" ‘ '•>' n">re of the thing appreciated The 

“ S"'" P'''' hterature is desiroufot prolonging, 
''"™"S'"»f'"-t(»'on.blereT,oiiiie towards 
itrellmYYarhic’d"' Appricwled Appreciation manifests 

of YYdcY.mntoc^niV'''’ ^ « P'en piece of hterature is desirous 

ck™, Ynd of Lon™ 'T’’’'',"" “eanmEs nhich it aims to 


in the rri"riti.?“VpikmY7oftTrr*\“erak"7j'*“'l'°" f '’''““Pf EjpencnceamI 
im srveioran-, r ducahonal Rreord, 2o SOG Ocio\xr, 

1 rnrlo^k of X arS' 't J Achiev emeiU, ' Thirti/Sevailh 

Qiiotrtl 1 V ,, nm« on of ihe h^arl I, pages 114-115 

Co'T.Inn^ 1'^" TltL. reference conliiiJ«,nIe*^2^" Public School Pubh-hing 

• pprerulion of literature atUtu hcl iTowanf^ designed to measure 

of t» . ibnc Al^ \\ A important 

ltlrr.Unlins ^ "^irhflh i rrjrlook oflk^ \^t “The Mea-nirement of 

I an I ( Lieae-3 Ltiiver*itj of Chtrago Presij igig ^ ^‘udj of kduaihon. 



OENERAL PRINCIPLES OF 


TEST CONSTRUCTION 143 


Apprecafon manJests .tselt m an 
nctuc pp=)rc on tl.o part of the indnidual to go bejond the thing appreciated to 
Kii 0 crcatne ocpie-con to ideas and feelings of his oL p hich the thing aTprert^^^^ 

IS desi"''i' Thcpeison nhoreallj appreciates a given piece of literature 

thing of ttlnt the nullior Ins done in the medium of Jiterature 
.> /dcnlidcalton o/ Onc'^Srl/ tnth the Thing Appreaaled Appreciation manifests 
it'cU m tlio incln idun) s active identification of himseJ/ with the thing appreciated 
1 ho person who rcallv aiiprccntcs a given piece of literature responds to U very 
much os i/hc wore nctinllv participating in the hfe situations nhich it represents 
G Dfsirc to Clarify One's Oun Thtnhng tnlh liepard to the Life Problems Raised 
hy the Thing Appreciated Apprecntion manifests itself in a desire on the part of 
the induKlual to clarify his own thinking with regard to specific We problems 
raided hy the thing apDrcciatcd The person who re illy appreciates a given piece 
of literature is •’timuhted by it to rethink his own point of view toward certain of 
the hfe problems with which it deals and perhaps subsequently to modify his own 
practical behavior in meeting thc«c problems 
7 Ofsirc to rtalunlc the Thing Apprcaoted Appreciation manifests itself m a 
con«cious effort on the part of the individual to evaluate the thing appreciated in 
terms of such standards of merit as he him«telf at the moment tends to subscribe 
to T he person who really appreciates a given piece of literature is desirous of dis- 
co\ enng and describing for him«cU the particular \ alues which it seems to hold 
for him 


Arnold” hoi shown that critical thinking can bo taught in the elementary 
school and that various phases of the process can be measured His study 
assumed tliat critical thinking involves the intelligent use of data which 
was defined as the “ability to recognize relevance, dependability, bias in 
source, and adequacy of data in regard to a particular problem, question, 
or conclusion ” The following item has to do with the recognition of bias 
111 data* 


Three boys were talking about whether or not a boy Jun was really ' out m a 
came of baseball that had been played that afternoon John was on Jim’s side 
George was on the other side Bill was not playing but was watching the game 
VVTiich of the three boys is most Lkely to be right? 

V^'hy? _ 


Recognition of the adequacy of data was measured by such test situa- 
tions as the following 

Some people were talking about a baU team Someone asked the others to tell 
TTiV I iVira^^fViniKrht this team was good Here are the answers Read them carefully 
St before ^ 0 * that you Ihmk is the best reason for 

Uus fs a good ball team Put P before the oue you thmt is poorest, aud put F 
before the one you think is just fair 
„ 1 1 think this is a good ball team 

” 2 I have seeu them play ouee and I th.uk they ore good player- 
3 I have seen them play several tunes and they are good players 

Arnold /»t.„gA^.^teapg^^m^^e™h.odS,v.hG,.d^ 

£Jucohonol Research Bulletin 17 



142 THE CONSTRUCTION OF TEACHER-MADE TESTS 


9 The development of phj sical health 
10 The de\ elopment of a consistent nhilosophy of life 

For anj given course these objectives must be cvprcsscd m terms of the 
specific changes in pupils which the teacher is seeking to bring about 
A rather detailed inventory of the particular facts, pnnciples, concepts 
and skills of the course is required, as well as the specific mental processes 
the pupil IS expected to employ To measure whether these processes are 
really functioning the teacher’s inventory just mentioned must be pre 
sented to the pupils in language difTerent from that of the text and class 
discussion and opportunities must be olTered to apply or to relate the 
objectives to new problems and situations Ihe center of gravity is the 
behavior of pupils rather than subject matter The teacher must not con 
fuse ends and means The true relationship Ins been stated as follows 
“The real ends of instruction arc the lasting concepts, attitudes skills 
abilities and habits of thought, and the improved judgment or sense of 
values acquired, the detailed materials of instruction — the specific factual 
content — are to a large extent only a means toward these ends 

\ group of English teachers for example were able to recognize seven 
different aspects of ‘ appreciation of literature ” They tlicn suggested the 
following ways m which these aspects of appreciation may manifest them 
selves m pupil behavior “ 


1 Satisfaction in the Thing Appreciated Appreciation manifests itself m a feeling 
on the part of the mdividual in keen satisfaction in and enthu'siasm for the thing 
appreciated The person who really appreciates a given piece of literature finds in 
it an immediate persistent and easily renewable enjoyment of extraordinary 
intensity 

2 Desire for Mare of the Thing Appreciated Appreciation manifests itself m an 
active desire on the part of the individual for more of the thing appreciated The 
person who really appreciates a giveu piece of literature is desirous of prolonging 
extending supplementing renewing his first favorable response toward it 

3 ZJesirc to Anmo More about ihe Thing Appreciated Appreciation manifests 
itcelf m an achv e desire on the part of the mdiv idual to know more about the thing 
appreciate person who really appreciates a given piece of literature is desirous 
of understanding as fully as possible the «=ignificant meanmgs which it aims to 


m thf BluVrbon-lfpi ■>( Testa u, the Accreditation of Military Experience and 

inthe Educational PI, cementoIHarAetcrana Edv,cat,«ml Remrd 25 368 October 

UnStandm " F„,f„ A™ (Chairman) The Alea.ureinent oI 

/on I Chicago 



Gi:i\i:RAL PRIXCIPinS of test construction 143 

4 Desire to Express One's Self Creattidy ADpreciatioii manifests itself in an 
active dc<iro on (he part of the individual to go bevond the thing appreciated to 
gi\ e croalu c ctprc«ion to ideas and feelings of his mvn v\ hich the thing appreciated 
haschiclli engendered The person who real^ appreciatcsagiven piece of literature 
IS dcMrous of doing for himself, either in the same or m a different medium, some- 
thing of what the author has done in the medium of literature 

5 Idcnlijicationof One's Self \nih the Thing Appreciated Appreciation manifests 
it®cif m the induidual s active idcntincation of himself with the thing appreciated 
Tito i>cr-on who reallv apprceiifcs a given piece of literature responds to it verj 
much os i/he were actuallv participating in the life situations which it renresents 

G Dcsxrc to Clanfij One s Oicn TAinhng tnih Regard to the Life Problems Raised 
hj the Thing Appreciated Appreciation manifests itself in a desire on the part of 
tlie individual to clanf> his own thinking with regard to cpecific life problems 
rwhcd b^ tlic thing appreciated Tlie per»on who rej!l> appreciates a given piece 
of literature is stimulated bv it to rethink his owai point of new toward certain of 
the hfe problems with which it deals and perhaps subsequently to modifj his own 
practical bchavaor m meeting these problems 

7 Desire to Ftalunte the Thing Apprccwlcd Appreciation manifests itself in a 
conscious effort on the part of the indivadual to evaluate the thing appreciated m 
terms of such «tandaals of ment os he him«elf at the moment, tends to subscribe 
to The person who reallv appreciates a given piece of literature is desirous of dis- 
covering and describing for bim«elf the particular values which it seems to hold 
for him 

Arnold*’ has show n that critical thinking can be taught m the elementary 
school niid that various phases of the process can be measured His study 
assumed that critical thinking mtoivcs the intelligent use of data, which 
was defined ns tlic “ability to recognize relevance, dependability, bias in 
source, and adequacy of data in regard to a particular problem, question, 
or conclusion " Tiic following item has to do with the recognition of bias 
in data* 

Three bojs were talking about whether or not a boj Jun was really “out” m a 
game of baseball that had been plajed that afternoon John was on Jim's side 
George was on the other side Bill was not plajang but was watching the game 
BTiich of the three boys is most hkely to be nght? _ 

BTiy? - - - - _ - 

Recognition of the adequacy of data was measured by such test situa- 
tions as the following 

Some people wore talking about a ball team Someone asked the others to tell 
wbv they thought this team was good Here are the answers Read them careful^ 

Put B before the one of these three that you think is the best reason for ^****^*“5 
this IS a good ball team Put P before tlie one you thmk is poorest, and put f- 
before the one you thmfc is just fair 
11 think this i« a good ball team 

2 I have seen them play once, and I think they are good players 

„ 3 I have seen them play several times, and they are good players 

.■Dmght L Arnold ' Te.lmg AMlly to Dots m the Mth and S»th Grades 
Edumltoml Research Buttelm, 17 2.5-2o9 278 December 7, 1938 


144 THE CONSTRUCTION Of TEACHER-MADE TESTS 

Read these next two Find the better reason of the t^\o, place a B before it 
Find the poorer reason, place a P before it 

1 A man who studies baseball and wntes much about it said thej were a 

good team 

2 A man I talked to on the street the other day said the> were good players 

Some bojs were talking about a boy they called D Jim "aid, “I saw D take 
a pencil from another pupil’s desk This makes me think he is a thief " If this is all 
that Jim knew about this, do i ou Ihmk Jim is right m thinking that Z) was a thief^ 

Put a Ime under your answer YES NO AM NOT SURE 

A ow tell why y ou answered as y ou did 


It must be recognized, moreover, that some of the objectives of mstruc- 
tiQn cannot be measured by paper-and-pencil tests of thi® kind At times 
rating scales, check lists, and other devices for recording observations of 
the individual at work or play are required The term “test” in this discu» 
Sion includes any instrument that affords valid evidence of progress made 
by pupils toward the attainment of the objectives of instruction 

One of the least tangible objectives of instruction is creahieness ” Gnmes 
and Bordin” have proposed that creative expression m art should result 
in the development of certain personality' traits These traits would be 
evaluated by' the art teacher through observation dunng a conversation 
wnth his pupils This process is guided by a check list upon which the 
teacher’s record is entered Gnmes and Bordin suggest that this techmque 
would be more valuable if a group of teachers co-operated in the construc- 
tion of a check list of their owti, rather than adopted wholesale the list 
given below 

1 Jnitiaiire — wiUmgnc'is to go into the unknown, to start o5 on a new track, to 
attempt something never attempted before, perseverance after recognizing a dead 
end willingness to trv again 

1 Attempts a medium, technique, or subject never attempted before 

2 Does not accept as final the v^ew of the subject which he happens to have m 
the place where he begms, but moves around and vnews subject to he pamted 
irora mam angles before decidmg where to work* 

3 Does not take for grante<l the posed object to be painted, but \ie\\« the total 
situation and sees what, for him and his cxpcnence, there is in it that is 
paintable 

4 Assists m posing the model or arrangmg still life, and the like 


For slimuhtmg <iiscua«iQn<i of this general topic, see Joy P Guilford ‘ Cri®’ivit> 
Ammean PaycholoQitt 5 441-1=54 September, 1950, and Louis L Thurstone “Creative 
Ta\enr’ Ci^pter 2 in Ajr},luxittoru of Ptychologii New York Harper A Brother^* l9o2 
•* Jamta W Gnmes and Ldward Bordm, “A Proposed Techmqm for Certain Ei alua 
lion* in Art ’ bdumtioruU Heeeareh buUeUn 18 1-6 29 Januuv4 I 91 'i 



145 


GLh^ERAh PRINCIPLES Of TEST CONSTRUCTION 


5 


litinR'! imtcn il lo be pamted as still life 
from outside sources 


objects ami the like to tlie studio 


0 Starts to work ratlicr tlian ifcpcndirie on the teacher for mstrurtions as lo hon- 
lO procccti 


' drSuira'’'"''' tut pursues the work despite reverses and 

ol berapes off in painting in oils the lliick paint ivliea canvas is gummed up 
111 nalcr color iiaslies „„( al,a„ steps arc necessary for continued 
«nrK on the pajiitinR 

b\ I)oc<* not rcRftrd initnl drawinR for a composition as absolute but mo\es 
finpes as the de\ eloping experience demands adjustments and changes 
S I’articip Ues in cK«a di«cus<5ions and contributes ideas and experiences 

9 Pursues 'omc meaningful actiutj as shctching if he finishes before the others 
in tiie class ratiier tlian stalling around 

JO Does not demand approial or <upcnision from the teacher or other students 
it almo'^t cicrj step in lug nork 

11 Places }u3 nork ana> from himself, cliangcs «cv\ 7 >omt m order to get a more 
obiccluc \ie« of his irork 


n Conerntmlwn Inlercsl MoluQtton-~\t^ot mth which an indindual attacks 
a prohlem and his oncnc«s of purjiosc which would result m excluding factors 
nonrcIe\ant to a guen problem (Dcr«c\crance is implied here' 


1 Not detracted hi others coming and going or talking 

2 Docs not come and go hlra^cU 

3 Docs not k 1I> conicrse about matters exterior to the situation 

4 Doc^ not Jet tJie work of others distract him from Jus oirn problems 

5 Does not quicUj come to a dead end in his own work 

0 Docs not stall around oreteoding to be working on a proiect 
7 Does work outside of the class that has bearing on class work, as go to gallery, 
sketch draw, and consult reproductiie material 


S Works on painting after hours 

9 Requests information as to work relatne to his development 


10 Contributes lo class discussion 

11 Talks to fnends about his work and attempts to explain what he has been 
accomplishing and learning 


12 Paints the same subject more than once does not give out quickly as regards 
subject matter 

13 Docs not contmually consult the tune 

III /u*men(— weighing the factors m a situation and taking them into ac 
count before initiating a new action that is considering the possible tesulte of 
an action before the initiation of it seeing the social impheations of a ProP»““ 
action Impfied m this is knowing when to go off mlo the nnknonn and oheo not 
to, knowing when to pursue an independent couiso and when not to 
1 Does not aid others to tht point of interfering with the progress of his own 
•SNork or that of others 



146 THE CONSl RUCTION OF TBACHFR-MADH TESTS 

2 Attempts to understand the point of new- of others rather than thinking of 
wa\s to justify him«elf 

3 Selects a position to work which docs not obstruct another’s ^^ew of the 
subject 

4 Does not talk •'o loudh that it distracts others who are w orkmg 

5 Takes into consideration the desires and mtereats of others nhen arranging a 
subject to be painted 

6 Analjzcs his n orkmg situation m relationship to tune, uhen there is a grouD- 
detennined time limit 

7 Knows when to pursue a project further and when to discard it and start 
another 

8 U«es materials and cares for them efficientlj — cleans palette*? n ashes brushes, 
and the like 

9 Works pksticalh tliat is he allon-s for norking wth the forms rather than 
setting down an arohitectuml plan as a ngid drawing nhich is filled m uith 
color Shows evndence of an exploralorj and feebng-out attitude rather than 
a rigid method of working 

10 Txamines cnticalK criticism bj otheis and makes u«e of it onlv «o far as he 
feels it ngnifcant, that is, he reacts to it m terms of its \alKlit\ rather than 
emolionailj 

1 1 Takes into con«ideration the needs of others when using group materials 

12 A\oids ^aclllatlon m following out his own pamtmg rather than shifting m 
sUle execution and attitude, as be «ees others m the class going m a direction 
different from his own 

1\ C<H>peratum — the willingness to worK m a group as a member of it m rela 
tionship to the teacher, mdmdual members, and the whole group 

1 flakes use of enticism, does not react to it as personal inbult, or cry, show 
anger or leaie class 

2 Is willing to alter his personal objectives to meet the situation 

It must be kept in mind that the objectives of the course represent 
directions of progress rather than destinations to be arm ed at by individual 
pupil's at anj particular time As far as possible, the progress of each indi- 
\ idiial should be measured in terms of his own interests, needs, and abilities 
This IS the aim of the modem school The degree to which it is actuallj 
attained in anj particular situation is dependent upon the resources avail- 
able as well as upon the educational philosophy and skill of the teaching 
staff 

2 The test should reflect the approximate proportion of emphasis tn the 
course To insure a reasonable balance m the test it is essential to draw up 
in outline form a sort of "job analysis" or "table of specifications This 
will guide the test maker much as the architect’s blueprint and spccifica 

“ For a 8> "tematic approach to ppecirying the content of mdmdual items see John C 
Flanagan The Lise of Comprcben«ive Rationales in Te«t Dev clopment EducaliofX^ 
znd / ejcholooxcal Meaauremenl 11 lal ISo Spring 19al 



GlMIi \L Pm\CIPLhS OF TESr C0\S7nUCTI0y 147 

Uom muh the builcIiiiK contractor It is v.c\\ to indicate not onlv the var 
tons objectives the teacher has had m mind but also at least roughs the 
rohtu 0 amount of emphasis each objcctiv e has recen ed in the actual teach 
ing of tlic course Tor evampic the same test might not be equally valid 
for U\o teachers of a course in general science using the same textbook 
This would be the case if one teacher emphasized almost altogether the 
memorization of isolat«l facts while the other was much more concerned 
with t!ie iindcrvtandmg of facts in relation to other facts and in their ap- 
plit itJon to practical problems in the community The teH should attempt 
to renect faithful)} the IcacJimg emphasis The amount of time devoted by 
the teacher to the various topical divisions of the course is a rough indica 
tion of wliat he considers to be their relative importance The content of 
the test slioiild show a similar proportion The time devoted to a topic can 
at best indicate onlj the mtmbcr of items to be included and not the type 
of the items Tlie tjpc of items to be used will depend upon the nature 
of tlic objectiv c to be measured A topical outline is only a partial guide to 
test construction The tabic of specifications should also indicate the ap 
proximate to idling emphasis from the standpoint of knowledge skills at- 
titudes and otlicr tv'pes of objectives that have been sought 

3 The nature of the test must fate into consideroUon the purpose tt is to 
sene knv test is valid to the degree that it serves a specific purpose If the 
purpo«c of the test is to afford a basis for school marks or for classification 
it will attempt to rank the pupils m order of their total achievement But 
if the purpose of the test is diagnosis its value will depend upon its abihtj 
to rev eal specific weaknesses in the achievement of individual pupils Diag 
nostic tests would cover a limited scope but in much greater detail than 
a test of general achievement and would be arranged so as to reveal the 
«corcs on the separate parts The range of difficult} of the items and the 
discriminating value of the items individually are relatively less important 
in diagnostic tests This is also true of mastery tebts administered at the end 
of a teaching unit to determine when the minimum essentials have been 
achieved 

4 The nature of (he test must take tnlo constderation the conditions under 
ukteh it ts to be administered In planning the test attci tion must be given 
to such factors as the time available for testing the facilities for duplicating 
the tests and the cost of the raatenaJs as well as the age and expenence 
of the pupils being tested 


B Preparing the Test 

The second step is the actual preparation of the test It has been found 
from expenence that the follomng rales or suggestions are helpful 

1 T/te prehmimry draft of the test should be prepared as early as possible 
Many teachers find it desirable to jot down items to be included in the tests 
day by day as the teaching progresses Tins is reasonable assurance that 



148 THE CONSTRUCTION OF TEACHER-MADE TESTS 

no important point in the course Mill be omitted in the test If this is not 
done, the supplementary material of the course \\ hich is not included m the 
textbook and which may be of unusual \a!ue is especiallj hkelj to be o\er 
looked T his practice also permits the raatenal to “grow cold” and conse- 
quentlj to be more correctly appraised before it is included in the final 
draft of the test 

2 The test may include more than one type of tlem A\a^et^ oftesttjTies 
IS Iikelj to be more interesting to the pupil than a single form This is 
e'peciallj trut of long tests Moreoaer, the requirement that the type of 
test situation should be the one which is most appropnate to the matenal 
to be included wall sometimes necessitate that three or four forms of ob- 
jective items be u®ed These objectiv e items are frequenth combined with 
one or more discu««ion questions to make up the test 

3 Most of the items tn the final test should be of approximately 50 per cent 
difficulty after being “corrected for chance” bj the procedure described in 
footnote 32 on page 160 That is, about half of the group should “know” 
the answer to each item, while the other half should not This requirement 
cannot be met \ erj clo«eh m the typical school situation, how e\ er, because 
item difficulties m the preliminary form of the test will vary considerably 
There will usually be too few items to permit discarding those not close to 
the 60 per cent mark A suggested rule-of thumb method for constructmg 
the final (post try out) form is this For motivational purposes, let Items 1 
and 2 be so easy that almost nobody will miss them Put aside (do not use) 
all other items whose correct answer was “known” by less than 16 per cent 
or more than 84 per cent of the students in the tryout group Then let 
Item 3 be the easiest of the remaimng items, one “known” by about 84 per 
cent of the persons tested Arrange the other items in ascending order of 
difficulty, with the hardest ones at the end of the test 

•The above discussion implies that for maxnoum discnimnation the diffi 
culty of the entire test should be such that, when allowance is made for 
chance the average pupil in the group makes about 50 per cent of the pos 
sible score It is clear, then, that a test which is of ideal difficulty for one 
class may be too easy or too difficult for other classes 

Perhaps one of the worst defects of most teacher-constructed tests is 
failure to make the items difficult enough, probably in large measure be- 
cause of the mfluence of the “70 per cent is passing” tradition In order to 
pass all but a few students, the teacher who grades on a percentage-nght 
basis must build tests for whicli the average score is 80 per cent or more, 
thereby causmg the items that make up the test to be too easy for efficient 
measurement 

A few exceptions to this principle should be noted In speed tests for such 
subjects as arithmetic and typewriting, where the objective is rate rather 
than poicer, all items should be rather easy Also, in both mastery and diag 



anKLVM pnhVctPLLS of test coNurKWrioN i w 


noslic tc-,(s the content is rfctctmincd primanly hj the impwtana: of the 
subject mnl IT rati, or than its rfijlfniff# An adequate diagnostic test ,n the 
fundamental combinations in addition, for example, might jield many 
almost perfect scores in a strong class, and scores ncl] bclon 50 iwr cent 
ni a ucak class * 


1 7/ i3 vsuaUi/ dcsirnlilc to mditdc more tlnns in ike vrdtminary draft of 
^hc test than ti ill he needed in the final form This mil permit “culling out ” 
kter on, items tint may appear weak or not needed to proilure the proper 
balance in (he tout For each subdmsion of the test, from 25 to 50 per cent 
more items slioiild be prepared than arc Jikel}’ to he required 
o After some hme has elapsed, the test should be subjected to a cnttcal 
rmsion Then the items sliould be carefully checked \Mth the table of spec- 
ifications to sec that the test shous the desired emphasis upon the \dnous 
topics A careful reading of the test after an interval of time mil usually 
re\ cal some objectionable items It is a good plan to hav e the test cnticized 
by other teachers of the same subject In this way some items are likely 
to bo found uhicli coi cr points of doubtful importance, others nhicli are 
not dearly staled, and perhaps others about ^^hlch there is disagreement 
as to the answers The uordmg of the items should receive critical atten- 
tion, particularly to avoid ambiguitj' One senous error is the nording of 
items so that more than one reasonable interpretation is possible Ihe 
trouble vv itli such ambiguous items is that a certain answer is correct with 
one interpretation, tiut mth another interpretation a different answer is 
reasonably correct 

C The ilans should be so phrased that the conicni, rather than the form of 
the siatemeni, will determine the answer A common mistake is t© include a 
telltale word or phrase that affords an unwarranted clue to the answer 
These so-called specific deiermtners are especially common in true-false 
items It has been found that statements containing emphatic words, such 


as the adverbs “always,” “never,” “entirely," “absolutely," “exclusively," 
and the like, are much more likdy to be false than true On the other hand 
words or expressions that limit the statement, such as “may," "sometimes," 
“as a nilo," “in general,” and the like, are much more likely to be true than 
false Either these expressions should be avoided entirely, a suggestion 
which IS rarely feasible, or items containing them should be carefully bal- 
anced so that approximately the same number are true as false Avoiding 
the language of the text will prevent pupils with good rote memories from 
answering items they may not vnderstsnd Sometimes clues are afforded 
by tlie spelling or by the grammatical form of the item It is not unlikely 
^hat one of the reasons why many pupils prefer objective tests to other 
tyqjes is that such tests often contain items so worded as to be answered 
from a minimum knowledge of the subject matter involved Such defects, 
however, are not mlierent m objective testing, they can be avoided by the 



150 THE CONSTRUCTION OF TEACIIER-MADE TESTS 


alert test maVer Admimstenng the test to persons unfamiliar mth the 
content of the course mil often reveal those items uhich can be answered 
from general mtclhgence or from a general knowledge of language forms 
and usage 

The opposite mistake is often made also Figurative language, needlessly 
heavj \ ocabulary, or inv olv ed sentence structure maj so obscure the mean- 
ing of an item that it is marked incorrectly by pupils who really understand 
the point Bob Burns’ storj of the time Grandpa Snazzy was a witness m 
court illustrates this error 

The attomej *=ajs “Isow, Mr Snizzv, did jou or did jou not, on the date m 
question or at anj tune prev^ou'^l 3 or subsequentb, «aj or even intimate to the 
defendant or anjone ebe, whether fnend or mere acquaintance or m fact a total 
•stranger, that the statement imputed to jou, whether just or unjust and denied 
bj the plaintiff, was a matter of no moment or otherwise? Answer — did you or 
did you not?” 

Grandpa thinks a while and then sajs, “Did I or did I not what? ’ 

Unless the test aims specifically to measure reading ability or general 
intelligence, the form of the item should neither impose unreasonable ob- 
stacles in the pupil’s waj nor provide clues which are too obvious Both 
defeat the purpose for which the test was intended 
7 The items should be so uorded that the xohole content functions tn deter- 
mining the answer, rather than only a part of xl There is often a wade dis- 
crepancy between what actually determines the pupil’s response to a test 
and what the teacher intended One of the pnncipal reasons for this dis- 
crepancy IS that only a part of the content of the item functions, the rest 
being wholly inert as far as the pupil is concerned Lindquist'* gives some 
excellent examples of this difficulty Two of these, shown below, should 
make the problem clear Note the first 

The leader m the makmg of the compromise tanff of 1833 was (1) Clay, 
(2) W ebster, (3^ Jackson (4) Tay lor, (5) Harrison 

That the majority of the pupils who responded to this item correctly did 
so on the superficial basis of the strong verbal association between the 
words “compromise” and “Clay” is evidenced by the fact that fewer than 
half of them responded correctly when the item appeared m the following 
lorm 

TJe leader m the tariff revision of 1833 was (1) Clay, (2) Webster, (3) Jackson, 
(4) Taylor, (5) Hamson 

That the matching type of test is also subject to this error is shown by 
the next illustration 


*' Herbert E Ilawkea E 
Achxnement ExaminahoTia 


F Lindquist and C R Mann The Construclion and Use 
pages 73-81 Boston Houghton Mifflm Company, 1936 


of 



(■fm iial PitiKcirLrs or tlst construction m 

Directions Bclon are tuo columns of items Mitch t!ie items m th*. 

cZCii'" A 


Column A 

J A rijooijiciin coritnhulton to cuilmlion 

2 mo«t fimous l)uihhnR of the nnetent Creek 
norld 

3 the npct whose ilrfeit in ISSSgucI nglmd 
(he control of the Atlintie Oeein 

•t 1 iKnindnn between tuo colonics (hit Liter 
twime famous is the dniejon lietueen froc 
and sli\c (ernton 

^ thcMctor> which eiu«c<l I ranee toeomc to 
our aul tlunng the Kovolutionarj War 

0 the Liw tint forbade <Iiicr 3 north of (he 
Ohio Jlivcr 

7 a rulinK b\ the Supreme Court which 
of)oned ill temton to «h\eo 


Column B 

1 Mason and Dixon 
I me 

2 Spanish Armada 

3 Saratoga 

4 Dred Scott Decision 

5 Parthenon 

0 ^fls®ourl 
Compromise 

7 Alphabet 


8 Printing Press 
D Ordmanceof 1787 

In most of tlio aboNO items a single word gives the due For example 
''boundan" in item 4 suggests Mine" m response 1 Likewise either 
' ruling" or "court" m item 7 suggests decision” m response 4 If a pupil 
knows (hat "armada" means 'fleet," he would bo able to match item 3 
with response 2 without knowing the date, the country or the event in 
\olvo<l It should be noted, furthermore that probably the above test 
would still bo poor, even if each item were well worded because the items 
included are so diverse m character 

ITic test maker should attempt to anticipate the specific mental proc 
cs'^cs the pupil will employ in each response For each item the teacher 
should raise such questions as the following Are there any parts of the 
Item that the pupil maj disregard entirely and yet respond correctly’ 
Wiat IS the minimum amount of knowledge required for a correct response’ 

8 All the items of a particular type should be placed together in the test 
Sometimes completion true false and multiple-choice items of varying 
numbers of choices are thrown together in random order Ibis arrangement 
13 rarely, if ever, desirable It is good practice to place together the items 
of similar tjqje Sudi an arrangement not only facilitates the sconng of the 
test and the interpretation of the scores but enables the pupil to take full 
advantage of the mind set imposed by a particular item form 

9 The items in the test should be arranged m ascending order of dijkvliy 
It is especially miportant to have the easiest items at the beginning and 
the hardest ones at the end of the test It wiU be recalled that one of the 
problems of measurement is to arrange conditions so that the thing being 



152 THE CONSTRl. CTION OP TE iCULR VADE TESTS 


measured is disturbed as little as possible in the act ot measunng Ihe 
psychological justification for placing the easiest items first is that such an 
arrangement has a i\holesome effect upon the morale ol the pupils taking 
the test On the other hand, placing \ery difficult items at the beginmng 
is hkelj to product needless discouragement m the pupils particularly ivith 
those of a\erage ability and below 3f the most difficult items come toward 
the end of the test, onlj the more capable pupils will probably get to them 
After all the only function of such items is to discriminate among the high- 
ranking pupils Inanj event an} disturbing influence on the weaker pupils 
will come too late to affect the results seriously 

In advance of an actual trjout of the test, it is impossible to determine 
anything more than a rough estimate of the true difficulty order of the 
items, unless one is willing to go to the trouble of obtaining the pooled 
judgment of three or more persons ” The judgment of a single experienced 
teacher regarding the difficulty of the items is hkel} to have some validity 
In any case it is usually possible to pick out those items that will be at the 
extremes of the scale, and fortunately this is what is needed most In later 
revisions of the test, the items can be placed in more exact order of diffi 
culty 

10 A regular sequence tn ihe paltem of responses should be avoided The 
order of correct responses should be a chance order rather than a regular 
paltem ** If items are arranged alternately true and false, or Iw o true and 
two false for example, the pupil is likely to discover the arrangement 
To facilitate sconng it is sometimes suggested that multiple-choice items 
be 80 arranged that the option numbers of the correct responses give com 
binations easy to remember, such as a date like 1453 But there is always 
nsk that the pupil wnll “get the hang” of the pattern and answer success- 
ful!} without considenng the content of the item at all 

11 Proitston should be made for a conienient unlten record of the pupil s 
responses Such a record is a check list, a rating scale, or some other similar 
form upon which the observer makes a s}stematic and permanent record 
of a pupil’s behavnor under a given set of conditions It is particular!} 
difficult to provide a satisfactorj written record of responses on oral 
quizzes ** In the ordinary test the pupil makes his ovin record in writing 
either on the test paper or a special!} prepared answer sheet The problem 
then is merely that of arranging the test so that the labor of scoring wall be 
reduced t o a mimmum Such devices as numbering or lettering the rc- 


” Sheman Tmkelman Difficulty Predtelton of Tezt Hems page 49 New \orJc Bu 
reau 01 1 ublication** Teachers College Columbia University 1947 turtbtr mvesti 
ption of this problem was undertaken by Irving Lorge and Lorraine Kruglov A 
Suggested Technique for the Improvement of Difficulty Preiliction of Test Items 
Educational and I sycholoffieal Measurement 12 5a4-oGl inter igo 2 
‘•Scarvia B Anderson Sequence m Multiple Choice Item Option- Journal of 
F I tcational Psychology 43 dlU-dOS October 19a2 

* Kostick and Me M Nixon How to Improve Or il Questioning Peabaly 

JiimalofUiralion 30 2Q<1 217 January 



niMUAh pnncirus of tfst coxstruption 


153 


M,on«o, m mult, ple-cl, 0,00 ,toms ™I the bhuU .n complet,on items so 
th-it the rospouscs lull he rcoonlrf in n column nthcr then scattered ir 
rop,lnrl5 oior the psRe, -aic time and reduce the chances of error ,n scor 
mrMorcll proupuiR the .terns hj fiscs rather than spacing them un, 

fnmiU rethioessomoiihat the cjcstmmm scoring the test 

T nr diroc/imis to Ihc piipd shou.i Uc as dear, compldc. and concise o 
1 1 mi.o mm should he to make the instructions so clear that the 

=^r H = 

;;; .'hr^ds ami the,r a Ime^de^' 

joiingchihlmn oresamt^d^^ 

rather 

r>thor than “uiulerlnic ‘ J lower grades it is usually de- 

than ‘Vneircle the j.^etions aloud to the pupils nhile the} 

sirahle for the teacher to read the u m,ereier the 

folloii silentl} „pl,c„lcd a generous use of samples 

form of the test is or practice tests that do not count m 

correetK market! and for commended Sometimes a blackboard 

determining the score is t pfpjedurc clear As the pupils 

dcmoiistmtion is tl.c „[ and the procedure used m 

become familiar uatli tl grcatl} abridged 

scoring them '''« ‘'''«''“’^Sc tLo i^iuts clear Thofollon.ngd.rec 



ol)jccti>e tests ilHjut measurement m 

irinscrioss To mnrona 

Sire 'Tbcto^-S Statement }tai e‘s Jr tL test Your 

\+^ U Group testa of mtethg 

.„ter the pupHs dire tiona may be shortened to a 

method employed m scoring thm 

form someuhat as o ^ p ,, y„ . wi 

DiiiEr’Tiws lu tne > 
have ton miautca for the test 



112 THF CO^S1R[>CTIO\ OF TLiCIlIR MADE TESTS 


measured is disturbed as little as possible in the act oi measunng The 
psychological justification for placing the easiest items first is that such an 
arrangement has a ^\holesome effect upon the morale of the pupils taking 
the test On the other hand, placing ^ery difficult items at the begmmng 
IS likely to produce needless discouragement in the pupils, particularly' wth 
those of average ability and below If the most difficult items come toward 
the end of the test, only the more capable pupils will probablv get to them 
After all the onlv function of such items is to discnminate among the high- 
ranking pupils In any event, any disturbing influence on the weaker pupils 
will come too late to affect the results seriously 
In advance of an actual tryout of the test, it is impossible to determine 
anything more than a rough estimate of the true difficulty order of the 
items, unless one is willing to go to the trouble of obtaining the pooled 
judgment of three or more persons •’ The judgment of a single experienced 
teacher regarding the difficulty of the items is likely to hav e some v alidity 
In any case it is usually possible to pick out those items that wnll be at the 
extremes of the scale, and fortunately this is what is needed most In later 
revisions of the test, the items can be placed in more exact order of diffi 
ciilty 


10 A regular sequence tn the pattem of responses should be avoided The 
order of correct responses should be a chance order rather than a regular 
pattem •* If items are arranged alternately true and false, or two true and 
two false, for example, the pupil is likely to discover the arrangement 
To facilitate scoring it is sometimes suggested that multiple-choice items 
be so arranged that the option numbers of the correct responses give com 
binations easy to remember, such as a date like 1453 But there is always 
nsk that the pupil will ‘ get the hang” of the pattem and answ er success- 
fully without considering the content of the item at all 

11 Proinsion should be made for a conienient untten record of the pupil’s 
responses Such a record is a check list, a rating scale, or some other similar 
form upon which the observer makes a systematic and permanent record 
of a pupil 8 behavnor under a given set of conditions It is particularly 
difficult to provide a satisfactory wntten record of responses on oral 
quizzes ” In the ordinary test the pupil makes his own record m writing 
either on the test paper or a specially prepared answer sheet The problem 
then IS merely that of arranging the test so that the labor of scoring will be 
reduced t o a minimum Such devices as numbering or lettering the re- 


St /’rediriJOTi o/ Teal /Imj page 40 New "iork Bu 

eatton of tk « Columbia Univeriity 1047 Further iu%esti 

Suzeiiwl TWh by Imng Lorge and Lorraine Kruglov ‘A 

w’’ \"iP^‘’'eroent of DifficuU> Prediction of rest ltem- 
if l Winter 19o2 



ni:xi:it PRiyciPu :6 of test construction 153 

.nonces m muIl,p1o-cl,o.re .Icm, ind the l.l.nks m tompict.on .toms so 
that the respoeses u.ll l>e recortlcl .n a column rather than scattered ir- 
remdarly over the paRC, sare t.me and retluee the chances of error m scor- 
m^MoreU Rronp.ng the dema by f.res rather than spaeng them unr- 

clpIcR. and conmc as 

r:n:,';:aldetodf,t 

the Items, the time alloned ' „,11 depend upon the maturitj 

in seonng 1 he amount of .let «1 -I deP™ P ^ 

of the pupils and their -P"'"-;" ‘ ..'dmu a line under” 

5 „ung children, or caam k, „„ 3 ucr” rather 

nitlior tlnn luujcrlnie, •» t., (jjc loner grades jt js usually de- 

than ‘■encircle the directions aloud to the pupils uhile thej 

siralilc for tl)C teacher on their 10*51 papers ^^^lere^ cr the 

lollon silentl} the nnttcii complicated, a generous use of samples 

form ot the test is or practice tests that do not comt m 

correctly marked and forc^ recommended Sometimes a blackboard 
dctcrminins the score is n,„(,c the procedure clear As the pupils 

demonstration is the '1;:^-^. :,7;':„f’,,„’’ms and the procedure used in 

0 l)jccti\e tests eriniimpnts ihout measurement m 

Diancnoss-romirjuriaj.^^^ 

acT'TlicIo'rc’ f 

x:''sr="’id“::^ctb 

‘'i High 

+, 1! Group tests ofiaWlg 

r&rrm ^ 

method employed m scon „ 

form someuhat as follows y„„ rfi 

Tnlhef Ibefjr- each Item put + 

i)iue^*»n 3 In t le t . 

have ten inmutrs for Should pupda be told or eii- 

authorities 'svould requ 



154 IHE CONSTRUCTION OF TEACHER-MADE TESTS 


tests They would include some such statement as, “If you do not kno^ 
guess’ ’ Others would go to the other extreme and say, ‘ Do not guess’ ’ 
Still others perhaps the majonty, would tie content with informing the 
pupil that the correction formula” is to be employed and let him use his 
judgment about attempting doubtful items Unfortunately, the expen 
mental e\ndencc on this point is neither cxK iisi\ e nor altogether coimncmg 
Most of the studii s have raerelj attempted to compare the relati\ e effect 
of the first two practices upon the \alidity and reliability of the stores 
without considering the third possibility at all 

The results have usually favored do-not-guess wildly instructions by a 
slight margin Davis is particularly opposed to forced guessing 

To force students to mark answers to items based on reading passages available 
for reference is undesirable but to force them to mark answers to items testmg 
specific subject matter that thej do not know and cannot figure out is far worse 
It 13 not only frustrating to the students but it goes contrary to good teaching 
practices and compels the students to break habits of carefulness that the schools 
try bard to inculcate Once in a while it is possible that students might be told 
that as part of an experiment they are requu^ to mark e\ery item in a test even 
when they have no idea what to mark but m systematic testing programs this 
would be inadvisable as well as unpractical It might eliminate venations in the 
number of omissions and thus wipe out some of the effects of differences m per 
sonality but it would do this at a cost of antagonizing teachers and frustrating 
students It would also introduce additional chance variance mto the scores 

However \otaw” Lentz*’ Cronbach*^ and others have found some 
evidence of a factor of acquiescence ’ operating in taking test** since do- 
not guess instructions placed ascendant students at an idvantagc over 
submissive students It is argued that such instructions rciluce the validity 
oi achievement tests since they become m some degree measures of per 
sonality traits Some investigators have also found that good students tend 
to improve their scores when they attempt doubtful items, whereas poor 
students do not 

All things considered the authors offer the following recommendations 

a The use of multiple-choice tests with fewer than four responses to each 
Item should be avoided wherever posvible 


’• This formula is dux-ussed on pages 156-158 
Fredenck B Davis Item Selection Techniques m E h Lindquist (Fd tor) 
Educational 'Tewmrement pages 274-275 W iishington D C American Council on 
Education 19ol 

** Daind F Votaw The Effect of Do-not-guess Directions upon the Validity of 
True-fabe or Multiple-choice Tests Journal of Eduealtonal Psychology 27 69i^703 


« Theodore F UnU Acquiescence as a Factor m the Measurement of Personality 
Psychological ButUlin ^ bo9 November 1938 

. ^ Cronbach Studies of Acquiescence as a Factor m the True-fabe Test 

Journal of Ed^ltonoi PsycAology ‘13 401-4li September 1942 Further Evidence 
on U^ponse &t8 and Test Deflign Ed untional and Psychological Measurement 10 
3-31 Spring 19 >0 " 



GLM.R.iL I’HlXClPLhS OF riST CONSTRUCriON 155 


.l.u. 7"’'^" of PO^S'MO nsponsM, tl,e .care should prob- 

™ ‘’■o 

c wiien tc^^ls with onl\ two or thnc po^^sible responses to each item are 
u eU with pupils abo\c the sixth grade, the correction formula should he 
cmpIo)c<i 


A \\humat the (orreetton formula la to be used, the pupils should bt 
SO infonncd 

o The reason for the corrcrlion formula should be diseus=cd tnf h pupils 
iicfore the test bej;iii5 Tlict should then he allots ed to use their best judg- 
ment, iMthout being speeinealh adiised bj the directions to guess or not 
to gUC-sS 


C. Trjing Out tin TinI 

\ftcr the teat h is been prcpire<l jttorUing to plan, it is ready to be given 
a tnal in acliml use Smcc it is im)>osaibIc m advance to know exactly how 
good the test is or to locate all the poor items, the tr>out should be con- 
sidered n nccc^sari step m constructing the test in ills final form Ihe fol 
lomng four principles should govern the trjout ^\ith the po«sibIe excep 
lion of the second, these principle'* are equally applicable to the later use 
of tho test in its fiinl form 

1 Bicry Tcasonahlc prccauttoti thouhJ he talc/i to invire normal condiUons 
for the test Tins is important because the responses lo on> test are partly 
determined bj the conditions undtrwhich it is given aswellasb> the test 
Itself It IS iisualli well to have tlit test adnnmstcrcd to the pupils in tlie 
familiar envaronment of their own cl issroom Anj tendency to cheat should 
be forestalled bj careful supervasion A\Tiere cheating is likely to be a special 
problem, pupils mov be so seated that every other seat is vacant, or the test 
Items maj be arranged in different orders for pupils seated close together 

2 The time allowance for the test should be generous This is more impor 
tant in the trjout than in the later use of the test m its final form One 
reason for this is that the items are arranged at best in only a rough order 
of difncultj, and, if the time allowance is too short, pupils may not have 
t ime to trj items tow ard the end of the test, w hich they may be capable of 
answenng correctly Short time allowances should be avoided, therefore 
in order to secure the data needed for determining the difficulty and the 
discriminating value of the items What tune allowance is to be considered 
generous will depend upon the purpose of the test and upon the ability and 
experience of tlie pupils For example, it is obvious that the time limits of 
speed tests sliould be so short that even tlie best pupil does not have time 
to finish the test On the other hand, more time should be allowed for diag 




156 THE CONSTRUCTION OF TEACHER-M iDE TESTS 


nostic tests than for *csts of general acluevenient, and tests of a purelj 
factual character can be answered more quickly than those involving Iht 
higher mental processes 

Lindquist suggests that m general achievement tests, the time allowance 
should be so adjusted that “at least 75 pet cent of the pupils mil have time 
at least to consider all items in each section Ruch seemed to favor time 
limits “so that 90 per cent can attempt all items within their power”” 
In accordance with this standard, lluch suggested that for fairly short items 
of a factual character, three recall or four recognition items per minute is 
a “reasonable expectancy for upper-elementary and high-school pupils ” 
For reasoning tests the corresponding time allotments would be increased 
for recall items to one or two items jier minute, and for multiple-choice 
items to two or three items per minute Younger pupils and longer or 
harder items would demand still more time 

The above standards have m mind the requirements for the ordinary use 
of the test in its final form, rather than for the tryout, for which more time 
should be allowed Since so many factors influence the tune demands of a 
particular test, the writers suggest that m the tryout sufficient time be 
allowed so that all, or almost all, the pupils have time to finish If the ex- 
aminer wnll record dunng the progress of the test the percentage of the 
pupils who are still at work after various amounts of time have elapsed, 
this information wall be useful in determining the time allowances for later 
revisions of the test 

3 The scoring procedure adopted should be fairly simple As a rule, the best 
procedure in scoring objectue tests is to give one point of credit for each 
( orrect response In multiple-choice tests this means one point for each item 
properly marked, and m recall tests it means one point for each blank cor- 
rectly filled It is unnecessary to weight the items according to estimated 
difficulty or importance Even in essay examinations w eightmg is much less 
important than is ordinarily assumed Almost all pupila wall be, w. the. same 
rank order regardless of the weighting of the individual items ** 

The correction-for-chance formula actually corrects for omissions rather 
than for guessing alone If no items are omitted by the students, or if they 
all omit the same number of items, their relative scores will be the same 
regardless of whether or not it is employed The general formula is usually 
written 



? LiDdqoM ..dc H Ham. ov M page 116 
Cl M Uuch The Ohjeclne or l\ev>-TyTp$ Exammahon, Z\2 Chicago Scott 
Forc^man A. Companj 1929 • f b ^ ^ t 

**w f Pt»ll'ps ‘Further Evidence Regarding Weighted Versus Un 

Summ" r “1,^^^’“"’'"'““’™ Fducabimal and Psychological MeasuremetU, 3 



157 


IL PKIXCIPLDS OF rPST COX&TRUCTJO.\ 

In tliH formula 

S IS the xcorc corrected for guc«J 5 jHg 
/i IS the nutnher of n^ht rc*pon^cs 

I! !•! the mimlior of irrong response-! nol counting omitted items 
u IS the number of options patented for each item 

I or two-option or true-fjibe items this becomes 
5 = ^ - fr 

Tor threcH^ptJon items the furmul i is 

S = « - ill' 

For four-option items tlio formiil t is 

.e = n _ jir 

For fno-option item^ it is 

t> n-iW 

If the items lii\e vtv or more options, it is probibly not wortii ivlule to 
correct tJio “right')” f^roro forebance 
The abo\o f<innii!i« nro (IcMcnoii to reduce to ziro the score of each 
pepson who is tot iIK iKiiorant comennng the mnten il as presented in the 
t< st and wlio gue— ( s at (he ngtit nnswers vMth a degree of success depend- 
ent upon the niimlxr of options eub item )ns For evimple if the test 
contains 100 tnie-fnbc Homs an<l if the tcslcc giiesMs at eich of these 
lie Sshoiild on the a\ t r igo answer ihoui lO items ‘correctly ” since there is 
one chance in two tint he wilt mark aiij item right purcl} hv accident 
T hus tlio c\iH< ted srorc for a wholh uninformed person would be oO nghl 
and aO wrong Ilowcscr, he richl> dcsciwcs a tiiuil score of zero wluth 
ropre«cnts ins know!c<Jgc of tlic material co\ered so from the 50 rights are 
subtracted the 70 wrongs 7? — ir ^ 70 — 50 — 0 

On the other hand if he answers 50 items correctly and omits the 
other 50. his «corc will be 50 ~ 0 = 50 Presumabb had he tried the 50 
omitted items he would have answered half of them (25) correctly by 
chance and mls^Ld the other 25, making his rights score 50 known 4- 25 
guessed = 75 and his wrongs score 25 75 — 25 = 50 the same score he 
would have ‘■ecured without any guessing There is a fallacy in this argu 
ment, though, for a student is nol hkeJy to know the answers to half the 
questions and be absolutely ignorant concerning the other half More 
likely, ho has various degrees of partial information and misinformation 
concerning a considerable number of items In many test situations these 
two types of information seem to cancel each other out, thereby making 
TP 

the R — Q~zri forinula suitable** 

** Freiltnck B Dwi-* op cil page 271 



158 THE CONSTRUCTION OF TEACHER-MADE TESTS 


It should be emphasized that, stnctly speaking, the correction formula 
IS needed only ^hen some students have omitted a fairly large number of 
items, v.hile others ha\ e omitted few Othennse, the ranks of the students 
inll be imchanged, regardless of whether or not their scores are corrected 
for “chance “ For psychological reasons the teacher may report corrected 
scores to the students, even though few items have been omitted This is 
especially advisable with true-false tests, where the poorest students may 
not reahze the extent of their ignorance and hence may protest if given low 
grades unless the R — W formula is employed 
If all indiMduals tested answer e\ery item, the standard deviation of 

scores corrected for chance is ^ times the standard dev lation of the 

nghts scores From this relationship it is apparent that correctmg total 
scores from a "do-guess” true-false test doubles their standard deviation 
0 — (0 — 1) = 2 divided by (2 — 1) = 2 Likewise, if there are no omits 
the standard deviation of corrected-for-chance five-option-item test score<» 
13 f = times the standard deviation of the uncorrected scores 
The authors recommend using the correction formula vnth true-false and 
two- and three-option multiple-choice test scores, even when omissions are 
negligible, m order to emphasize the range of knowledge within the group 
tested 

4 Before the actual scoring begins, answer keys and scoring rules should be 
prepared In teacher made objective tests satisfactory sconng keys can be 
prepared by sunply filling m the correct responses, preferably with a colored 
pencil, on one of the unused tests Scormg then consists of comparing the 
pupil’s responses with those on the key placed beside his paper In essay 
examinations the key consists of a model paper contaimng a complete set 
of answers, together with the points to be allowed on each Defimte rules 
are necessary to secure uniformity in sconng The rules for sconng objective 
tests usually say merely that one point will be allowed for each correct 
response and that no fractional credits will be allow ed, and indicate w hetber 
or not the correction formula will be used The rules for essay examinations 
give the weight for each question, and tell whether or not any deductions 
are made for errors in spelling, language usage, and so forth In mathe- 
matics tests the rules should cover such pomts as whether or not the 
answers must be reduced to lowest terms, whether or not credit will be 
allowed for solutions correct in principle but with the wrong answer, and 
the like 


If the students’ answers have been recoixled on a special answer sheet, 
such as the International Business Machines general purpose answer sheet, 
then a punched-out cardboard key will speed up scormg a great deal One of 
the most useful of the various answer sheets is shown m Figure 5 “ 


An«»er ebetts and the kej punch may be purcha<>ed from 
Machines Corporation Endicott New York 


lutemational Businc*3 



159 


ar.\^VRAL pRixapiXH or rrsr conbtrvction 

n. f^nluntiiig tfu* Tc*<t 

After the papers h uc been scored the rcsnlts sliould l)c interpreted and 
evaluated from two points of view first, as to tlie of the test itself, 

and, second, as to the qmtitj of the pupils’ responses Wiule the ultimate 
interest of the te«t maker is in the light thrown b> tlie test re'sulls upon the 
qunlit}’ of the teaching and orgamratiori that ovists in the sihool, lus first 



J Igure 5 IBM Cteneral Purpose Ans^rer Sheet 



IGO THE CONSTRUCTION OP TEACHER-MADE TESTS 


concern should be the quality of the test used Only tests of high merit 
afford suitable information regarding the school situation To what extent, 
then, does the test possess the three charactenstics of satisfactory measur- 
ing instruments, validitj , reliabihtj , and usability‘s Only the last of these 
can be confidently determined m advance The fi\e principles that follow 
are suggested for evaluating the test from the vieu point of its validity and 
reliability If the test is found to possess these qualities in high degree, 
the scores should then be carefullj analj zed for their value in instruction 
and school administration If the test is found to lack these qualities, the 
scores can be disregarded and the test subjected to a thorough revision 
\o matter how carefully the test is prepared m the first place, its merits 
should be established and not merely assumed 

1 The difficulty of the test is a rough indication of its adequaci/ The diffi- 
culty of the test as a whole is determined. by finding what percentage the 
average score made (corrected for chance, if appropriate) is of the maximum 
possible score In general achievement tests, the nearer this average is to 
50 per cent, the better The difficulty of the mdmdual test items is obtained 
bj finding the percentage of successful responses for each item, usually cor- 
rected for "chance” m the same manner as the total score Items answered 
by 100 per cent or by 0 per cent of the pupils are of no value in a test of 
general achievement The difficulty of the test is relatively unimportant 
in master} tests and in diagnostic tests 

2 The internal consistency of the individual items in the test is determined 
by Iheir ability to discriminate between pupils who rank high and those who 
rank low on the test as a whole There arc several methods of determining 
this Only the simplest of these methods are practical for use wnth informal 
tests A satisfactorj procedure for the classroom teacher is to determine 
the number of correct respons“s (or of incorrect responses) to each test item 
oj the pupils who rank in the highest 27 per cent of the class on the test 
as a whole and to compare this with tbp corresponding number in the 


‘ For a liripf but suctre^tue report hoc Ellis Weitzman and Waller J McNamara 
Technique* tsed in \ii d\ rinp; the learning Achievement of Naval Aviation Cadet* 
Jn irnnl of Td iratio lal I hoingy io 181 18o March 1944 
” Proportion of leatees who ksiim the answer to an item 

- ( number who markeii it corrccllj - _"uniber who marked it mcorrecth \ 

\ ntunberof options Item ha« minus one/ 

divided by the total number of tcsteca In symbol* 


0-1 


Thii fonnuU i- appropmlo onli vrhen neatly all tximinees have had time enmith In 
Davia Item Selection Teel, niqiim pJ.i- 

27R-.^ m r F I mdqui.t (Itlitnr) Fduraliimal lie i^nren enl l\ a^hiiieton 1) ^ 
American I mn don rdir-niion t'»j» ® 



OIMRAL PliIi\C/Pns OF 7FST COXSTRUCTIOY IGl 

T.r'i 1 *" ''**“^*’ 

rc.pon«cs of the h.Rli Rronp exceeds that of the lo« group bj the largest 
amount urn best (hose in xxbrl, mim\>en nrc the same arc uselc-^s Ind 
hose m nhicli (he number of correct responses of the high group falls 
t) 0 Jund that of (he Ion group are <tc(nmcntnl Items shomng zero or nega 
tnr thvrrimuntion should he either reworded or thrown out altogether 

3 It ts n goofi vrarher to hate thr tlans interpreted or cnticmd by persons 
irho hate tafrn the let! It is impossible to antieipite full} all the mental 
pToccs.es pupils will rmpfo\ m responding to a test item These can be 
<lotcrminc<I onl\ b\ making inquir} of pupils who have taken tlie test 
In this u-ax irrelcx aneies and ambiguities w ill be rc\ calcd that w ere w holl} 
unsuspected tit the maker of the test Often a slight change m wording is 
sumnent to remwh the difficull} \t other times the item must be cntirelt 
di^canlwl If a test contains too man> of tlie^c items the scores on the test 
sliould not l>c counted in <Ictermimng the pupil s record in the class Invil 
mg members of (he class to assist m (his critical cxafuation of the test may 
help to create a faionhle attitude toward the measurement process cm 
pim cd b} the instructor and is a x aluabic educot lonal experience in itself 

4 II Afiinrr ixt^stUe the resuth on the test should be checked against an 
outside crtkrxon For «iiort tests coxermg small units of subject matter 
tins process IS hkeh to be difRciilt and of little \ aluc Ei cn here it is some- 
times helpful to compare the ranks of the pupils on the test with those 
^s^igned b} the teacher licfore the test is given Thevalidit} of the longer 
and more important tests can be determined m a more satisfactory manner 
bj comparing the scores of the pupils on cacli test with their scores on a 
good standard test cov ermg the sarnc material and gi\ cn at about the same 
time The coefficient of correlation obtained between the two senes of 
scores is the most exact method of expressing the amount of agreement 
although a rough indication can be obtained by companng the percentage 
of scores which he m the same fourths of the two senes of scores 

5 It i€ sometimes desirable to obtain the reliability copjiaent of the test 
The authors recognize that it is cas> to overestimate the value of the reli 
abilit} coefficient The makers of standardized testo have often made this 
mistake However the reliability coefficient does have some ment in eval 
uating informal tests althoughlhcvalueismainly negative Low reliability 
coefficients indicate tests of doubtful merit but high reliability coefficients 
per se do not establish the value of the tests To be of real value these co- 
efficients must be supported bj other criteria 

‘‘ThBlea'ons for selecting contrasting groups from the 27 per cent at the extreme«i 
of the distribution are given b> Truman L Kelley T^he Select on of Upper and Lo jer 
Groups for the Validation of Te^t Items Journal of Educational Psychology 30 17 2i 

Item anabsis of a classroom test « set forth m Appendix B pages 

^^^n\ppendK B page 452 a simplified method for securing a one-form rel ab Iity 
coefRc ent is appl ed to a typical c’“ “* 



1G2 the CONSTRUCIIOX OF TEACHER-MADE TESTS 

The construction of an informal teaclier-made test, then, involves these 
four steps planning, preparing, trying out, and evaluating It is perhaps 
more correct to say that these activities constitute a cycle in the construc- 
tion of a test, for it is often necessary to repeat these steps, particularly the 
last three, several times before the test is brought to its finished form 

Selected References for Purther Reading 

Adkins, Dorothy C , and others, Consmtdum and Analysts of Achieiement Tests 
Washington, D C U S Go\emment Printing Office, 1947 292 pages 
Cronbach, Lee J , Essentials of Psychological Testing Ne%\ York Harper A Brothers, 
1949 475 pages 

Da\Ts, Frederick B , “The AAF Qualifying Examination," Army Air Forces Aina- 
lion Psychology Research Report No 6 Washington, D C U S Government 
Prmtmg Office, 1947 266 pages 

Gardner, Eric F , “Development and Applications of Tests of Educational Achieve- 
ment m Schools and Colleges," Revitw of Educational Research, 23 85 101, 
Februarv, 1953 

Goheen, How ard W , and Ivav ruck, Samuel Selected References on Test Construeiion 
Mental Test Theory, and Statistics, 1929-1949 Washington, D C U S Govern 
ment Prmtmg Office, 1950 209 pages 

Goodenough, Florence L , Menial Testing Its History, Principles, and Applications 
New York Rmehart 6. Company , 1949 Chapter 8, ‘ The Analy sis and Selection 
of Test Items " 

Jordan, A "ikl , Measurement in Education An Introduction New York McGraw- 
Hill Book Company, 1933 Chapter 2, “Characteristics of Measunng Instru 
ments “ 

Lmdquist, E F , “Prelimmary Considerations m Objectiv e Test Construction * 
Chapter 5 m E F Lmdquist (Editor), Educational Measuremeni Washington 
D Cl American Council on Education, 1951 
Odell,C W jIlowloImproieClassroomTesling Dubuque, Iowa Wfilliara C Brown 
Company, 1953 Chapter IV, “Test Construction General " 

Stanley Julian C ‘ A Simplified Item-Analysis Procedure," American Psychologist, 
6 369 July 1951 

Stanley , Julian C , “A Simplified Method for E^timatmg the SpUb-Half Rehability 
Coefficient of a Test,” Hanard Educalional Review, 21 221-224, Fall, 1951 
Stanley, Julian C., ‘“PsychologicaT Correction for Chance,” Journal of Expert 
mental Education, 22 297-298 March, 1954 
Trav ers, Robert M W , How to Make Achieiement Tests New York Odv ssey Press, 
1950 180 pages 

Travers, Robert M TN , “Rational Hypotheses in the Construction of Tests,’ 
Educational and Psychological Measurement, 11 128-137, Spring, 1951 
Traxlei, Arthur E , Jacobs Robert, Selover, Margaret, and Townsend, Agatha, 
Introduction to Testing and the Use of Test Results in Public Schools New York 
Harper A Brothers, 1953 Chapter 2, “Rliat Do Tests Contnbute to Understand- 
mg the Individual Pupil? ’ 

eitzman, Ellis and McNamara, Walter J , Constructing Classroom Examinations 
— A. Guide for Teachers Chicago Science Research Associates, 1949 Chapters 1 
and 2 “Basic \spcotsof \chiev ement Tests” and' Steps m Clas^sroom Tertmg " 



6 


Principles of Constructing Specific Types 
of Objective Tests 


A* IfttroducCion 

Tjpca of ohjcctnc (cats. The pnnopal types of objective test items 
used 1)3 classroom teachers may be listed a.s foIJoas 

1 KeciIIhpcs 

a Simplo'rcc'ill 
b Completion 

2 Ileeognitjon tj-pes 
a ^^o^e comrnnn 

(n AItcmati\o-rcspou><. 

U) Multiple-chnico 
(31 Matcluiig 
b l,essfommon 

(11 Hearrangement 
(21 Identification 
(J1 Analogj 
(41 Incorrect statement 

This ehapfer consider the uses and limitations of the commonly used 
forms of ofijective tests and suggest rules wluch ha\e been found to be of 
value in constructing them It will also give illustrative items in a vanety 
of fields, drauTi mainly from standard tests 
Frequency of use by teachers. Two early studies present data on the 
frequency of use hy classroom teachers of various forms of test items In the 
first of these studies Conneau* analyzed 45,418 test items that appeared 

* SuQimanzwJ b\ G M Ruch The Ohjerlive or In ew~Type Examination pages 18S-190 
Chjcjgo Sroll Fyres/nan & Company 192^ 

103 


V' 


m THE CONSTRUCTION OF TEACHER-MADE TESTS 


in 375 objective examinations submitted in a pnze contest This studj 
doubtless represented the practice of superior teachers in 1928, rather than 
that of av erage teachers In 1936 Lee and Segel * reported an analj sis of the 
tjrpes of mformal tests used bj 1,600 high school teachere chstnbuted 
widelj over the XJmted States That there is rather surpnsmg agreement 
between the«e studies is indicated bj Table 25 In both studies the comple- 

TABLE 2a 


Ranking:) op Test Items Accobding to Freqcesct or Use as Revealed 
BT Two Studies 


Type of Item 

Conneau 

Lee and Segel 

Completion 

1 

1 

True-talie 

2 

2 

Mu]tiple<hoice 

3 

4 

Esaaj 

11 

5 

Problem. 

7 

1 6 

Matching 

^ i 

1 


tion form ranhs fir't and the true-false second Conneau grouped all recall 
forms under completion, while Lee and Segel separated out the one-word 
answer t^-pe This of item, which ranked third in the latter stud>, 
has not been included m the table The next most popular item is the 
multiple-choice form The most stnkmg disagreement i 3 in the relativ e rani 
of the e«sa> examination In the earber stud> oul> 0 6 per cent of the ques- 
tions were of the e'sa> tj-pe, while in the more recent study 16 per cent 
of the teachers appear to be using that tnie extensivelj This apparent 
renv al of interest in the essaj examination is probably le^s marked than 
the difference m ranks between the two studies would indicate, since the 
earlier tests were written for prize competition In fact, Lee and Segel* 
conclude that there was a definite shift toward objective tests 

Davis and Henslej * report that [high school] teachers prefer a combi 
nation of essaj and objective questions Fiftj nine per cent of the teachers 
U'^ a combination of question tj'pes Of the total group four per cent use 
the e-sa> question exclusixelj and thirty per cent use objective 
examinations onJj ” 

Comparative validity and reliability of various types of lesU 
Ruch* summarized the experimental studies a\ailable in 1920 and came 


» Teshng Frtuitcea of High School Teaehert page* 

Ir-l- t.mled Stales OEBceoIEducaljoDBuBetia No 9 J936 

* J Murraj Lee aod David SegeJ <jp cU page 6 

III:, 7^1 Robert A. Davu VVbat Higb^hool Teachers Think and Do 

ond SupOTinon 3S 

* G M Uueh op al pages 281 306 








hl'lciftr ni-Ii, 01 0BJILJ1\L Iba 

I" Iho r„„rl,.s,o,. th.l • th, n<.«..,pc ar. at least as talid as the cssav 
e\amin iliaiis ami tint llie t irmus nlijcctitc tvpes arc ‘ not RriatK im- 
Iijifi in \ ilnlm Until also tmuludnl tint for equal norking times 
reetll an, I rceoaintion ttp-s are not greatlv tliss.nnlar although recall 
l-'els lincletl to nnk at the luji ami true-false at the bottom in most of 
Inc s{iifljc.« 

Diiniig the nr\t Ipii inrt •(p\pnl oxpcrimpnt'iJ btucltcs and excellent 
‘•umnnrjesoftIielitent»ire\\en‘piil.li«,})ed Tho«c bj Kinney and Eunch* 
and In In and Sxmonds' were the most comprehensne The Jailer studv 
pojnl-> «ul ih it tlu proJilem of detemiimtJon of (he eomparitnc merits of 
jijfferent measurniK in-^lnimenl** is riot onl> one of the most important” 

>t Is iJbo “one of tlie most poorh done ’ 

IIUHland* nho vummnnzed the experimental htcritnro to 1938 and sug 
g^xted two ennlimjs conclusions 


1 One niuht conclude tint the objcctnc test's wilh probiWi the eveeppen oS 

die tmc-fil'C t\[te nre v^ImI n*? or pcrlinp^ more %alid than the essaj 

or ruhjectno cximmatun and tint of nil the objcctn c forms the completion or 
•imple renl! <ctni* to Ih? the mo^t inJid 

2 Gcncmlh siioakinp the \ mous t\pc« of objective lc«ts have about equal 
rchabilitv tvhen eomnanvl on the I a«ia of uorkmg time Differences of rehabihtj 
luav be due prim inlv to the worvlinK of indiv idual items r itlicr than to the objectu e 
form 


I ind(jui''t* irtke^ llie position (hat mniu of the «;tudies which haxc at 
t< mptod to dctcmufic the compantiic talidities nnd reliabilities of various 
teat fonns have been ‘ jnconcliisixe if not defjnitelv misleading He points 
out that tlicsc comparative bludies have not alwavs recognized the specific 
nature of tc^t v nhdilj , Iiav c o\ crcmpliasircd the importance of reliabdit^ 
irid hnv e often faded to control auch factors as reJatn e skill in constnicting 
the various test farms aa<l (he time allotmeitts fat the tests In view of 
the«c liinUatioiis I indquist comes to the conclusion that in making a selec 
tion from a number of test techniques in anj specific test situation or in 
rtlatiori to any “peeific objective of instruction the test constructor must 
at present depend almost entirch upon logical considerations rather than 
upon the experimental or empirical evidence that is now available 

A summary published in 3950*® concludes that few dependable gen 


*L B Kini pv and \ C Eun b A Sumimrj of Invent gitions Comparing Diffpr 
ent Tv DCS of Tests ‘ichcol and Aonrly 3C 5^0-544 October 22 1&32 

'J Mumv Lee and Percival M Symonds New Tjpe of Objective Tests A Sum 
rnaryof Recent Invistjgations Joumaf o/£ducalionat / sj/cAo^y 24 21 39 Febmarj 

1933 ‘’5 161-184 March 1934 . rr i 

•Henrv Daniel Uinslinci Conslnicting Testa and Grading in Elementary and High 
bchool SuhecU pa<Tes 2Ji^299 NevYork Prentice-Hari Inc 19oS 

» Herbert s Hawkes E F Lmdqu«t and C R Mam Tke Construclxon and Use of 
, . n s v-icea WJ 103 Boston Houghton MifHm Compan> 1936 

Fncethart E'caminations \n ^ Eneytiaped a o} hducai onal Research 
lilted bjWaltTrVMonr 407 412 Vew Aork The Macmdlan Company 

UjO 



166 THE CONSTRUCTION OF TEACHER-MADE TESTS 


eralizations can be drawn from studies in this area” and that the “over- 
lappings are much more significant than minor differences between aver- 
ages ” Another article m the same volume*’ amves at this conclusion 

The relatne effectiveness of a test technique is specific rather than general 
It IS probable that the vahdity of a t«t technique m most fields is more a function 
of the ingenuitv with which it is apphed than it is of the test technique em 
plojed 

Adequate comparisons between test techniques can therefore be made onij for 
•mecific material when results are u«ed for a specific purpose when items are con 
structed with «fp€cific msight and abihlv, and on the basis of validitj coefficients 
comnuted for equal amounts of testing time when each test is administered at its 
optimum rate 


To be of practical guidance to the classroom teacher, research should 
seek answers to such specific questions as the following In the measure- 
ment of what specific objectives m science is the true-false techmque of 
most worth‘d What testing techmque is most effective for measunng v ocab- 
ularj in a foreign language’ What distmctue value, if any, has the rear- 
rangement test m historj’ For the present, one’s choice of the tools of 
«cxence must depend chiefl} upon one's personal judgment and general 
educational philosophj rather than upon direct experimental evidence 
It IS well to recognize that knowledge may exist and function on at least 
four different lev els The lowest lev el mv olves mere recognttion A person’s 
general reading v ocabularj , as distinguished from his speaking and writing 
vocabularj, is an example of knowledge where the abihtj to recogmze is 
the important thing The next higher lev el invoh es recall For knowledge 
of tnan5 types to have value, one must be able to recall it when needed 
Famihar examples are one’s speaking and writing vocabularj, the names 
and faces of acquamtances, and the ordinary number combmations in 
anthmetic Sometimes one needs to recall separate facts or isolated bits 
o knowledge, but at other tunes the oigamzalion is important The per 
son who is an entertainmg conversationalist, an mterestmg letter writer, 
or an effectiv e pubhc speaker must be able to pre’^ent his knowledge m a 
connected form A still higher level of knowledge mv olves the abihtj to 
tnlerpret and aaluale At this level the learner must have a sufficient under- 
standing of the material to be able to see it in its relationships to other 
things The exercise of discnimnation and judgment is imphed The highest 
evel of all involves application The person who is able to utihze infoima- 
tion acquired m one situation and who applies it to the intelligent solution 
of problems in a new «:ettmg has arrived at true masterj 
It -^eems reasonable to assume that the tj-pe of test used must be appro* 
pnate to the level of knowledge being measured Tests of the multiple- 
matching tj*pes appear adequate for the first lev el of knowledge 

Achieveiaent TesU, m 19,0 Encyclopalu. of EducaUonut 



SPICinC TYPPS OF OBJECTIVE TESTS 


167 


Uccili to‘^t‘5 miy be required for the other three lev els \\ lierev er orgamza 
tion IS important, the c-saj Ui>e is perhaps more appropriate than the 
simple recall Iloucvcr, far more important than the tyve of test is the skill 
vMth which it 13 u^ed Understanding evaluation, application, and “anj 
other ■u.pccts of tlunkinR can be mensured b) recognition tests, but to do 
so reoiiires n degree of skill tint the regular classroom teacher rarely at- 
tains^” It IS nbo prolnbh tnic tint most recall tests measure memory on > 


n. Siniplc-nienll Tests 

IS r I rlie simn/r rccnll test is here somenliat arbitranly deBned 

Rested ansnem supplied I'' ^ the basis of length of 

tinted from the “'■‘5 to the simple-recall item is short 

pJnhirrsT^lotoul or .ihrase Thus it is sometimes called a short- 

""rdTauC' ^ 

santagooframiliantj and n ^ , ^,ng as a factor for measure- 

pnclieesniid ^e mtTeol^uTauto of ohjectue tests The 

ment, thus avoiding . \iUnbIc m mathematics and the physical 

simple mall test is parti ’ a,s the form of a problem requiring 
sciences, nlicre the J^rp-Paation to test situations presented m 

computation It nlso haa aluch the pupil is required to 

the form of maps, charts and j ttca 

supply , m spaces P™' , „all test is that it tends to measure highly 
One limitation of the simp |„,ed bits of information Also the scor 

factual knowledge, consisting of .sow 

.„g .s somcahat the tests are earefiilly prepared, as can 

j Illusicalwns of Simple RecaU Tests 

Beloa are a few ‘‘g;;eenL\^ 

lieen taken from standard tests 

h n..ve d..cu...o» of tW P«b.sm 

Z vanous -a fFB 



168 THE CONSTRUCTION OF TEACHER-MADE TESTS 

forms used m a variety of school subjects on all educational levels are to be 
found m Rinsland “ 

Stone Reasoning Tests m Anthmetic” 

1. James had 5 cents He earned 13 cents more and then 
bought a top for 10 cents How much mone> did he ha^ e 

left? Answer 

2 How manj oranges can I buj for 35 cents when oranges 

cost 7 cents each’ Answer 

Sones-Harry High School Achieiement Test, Part 11“ 


1 'U’hat mstrument was designed to draw a circle’ ( 

2 W nte “25% of ’ as “a deamal times ” ( )2 

3 Wntemfigures onethousandse\enandfourhundredths ( )3 

Cooperatue General Mathematics Tests for College Students, Form 1934” 

28 How manj axes of sjTnmetry does an equilateral tnangle 

have’ ( ) 

29 Eignt IS what per cent of 64? ( ) 

30 'W nte an expression that exceeds ilf by X ( ) 

31 Sohe the formula P = ^ for A . ( 1 


Iowa Placement Examuiations, Chemistrj-Training” 

1 The atomic weight of K is 39, of Cl, 35 5, of 0, 16 
Ti’hat is the molecular weight of KClOF 

2 If 7 gm of iron umte with 4 gm of sulphur, bow manj gm 

of iron sulphide will be produced? 

Testa on E\er 5 daj Problems m Science, Unit XII “ 

^\'hat de\ ice is u«ed m a ^ acuum-cleaner to pump air mto 
the dust bag’ (15) _ 

What IS the pressure in pounds of ordinary air per square 
inch’ (16) 

An Exercise from a Biologj Workbook” 

Directions As jou locate each part using a hand lens on an actual specimen, 


Ilenrj Daniel Rinsland op at , pages 23-222 

“ Devised by C Stone, and published by Bureau of Publications Teachers Col 
lege Columbia Univer«it> 

“Devi«ed by W W D Sones and David P Harry, Jr, and published bv ^orlJ 
Book Company 

” Devised by H T Lundholm and L P Siceloff, and nubliabed by Cooperative 
Test Service 

>» Devised bj G D Stoddard and J Comog and published b> Extension Division, 
State Unn crsitj of Iowa 

** Devised b> C J Piepcr and W L Beauchamp, and published by Scott Foresman 
& Compmj 

” Prepared bv Arthur O Baker and Lewis II Milb to accompany thiir Dynamic 
Ihotoyy Tvlay Chicago Rand McNally d. Companj, lOiJ 




of tlie Gra«sliop^o^ 


1 ho following Items from an informal class test jllustratc the po^-sibiljtjcs 
of recall tc^ts \n(h more than one response to cacli item 

Inr eicli oicfit Itelon pi\e the countrj, jear, and fH*rson with whom }ou 
wssociati it 

teent CounCry 1 far Person 

I »r«t psychological lalwratorj 

I irst gencnl inlclhgcnce test 

Tint filandanhred achic\cmcnt test — . _ _ 


I{ Rules and Suggestions for Construction 

The simplo-rccall is one of the most familiar test forms and one of the 
easiest to prepare The mam problem js how to phrase the test situations 
so that they will call forth responses of a higher intellectual Ie\el than mere 
rote memory, and so that they can be scored with a mimmum expenditure 
of time and effort 

1 The direct-question form ts usually preferable to the stalemni form It is more 
natural for the pupil and is likely to be easier to phra«« 

EXAMPLE The first president of the United States was 
BETTER wag the first president of the United States’ 

2 The questions should be so icoraea that the response required %s as brief as passible 
preferably a single vxrrd, number, symbol, or at most a short phrase This wiU objectify 
and facilitate scoring 

3 The blanks provided for the responses should be in a column, preferably at (he 
right of the questions This arrangement fawlitates scoring and is more (»nvenient 
for tlie Dupil The illustrations aboie show \anou8 ways of arranging the answei 
column 


168 


THE CONSTRUCTION OF TEACHER-MADE IhSTS 


forms used in a vanety of school subjecte on all educational levels are to be 
found m Rinsland “ 

Stone Reasoning Tests m Anthmetic*® 

1 James had 5 cents He earned 13 cents more and then 
bought a top for 10 cents Hon much mone> did he ha\ e 

]eft? Answer 

2 How manj oranges can I buj for 35 cents when oranges 

cost 7 cents each’ Ansicer 

Sone'i-Harry High School Achievement Test, Part 11“ 


1 What mstrujnent was de®igned to dran a circle? ( H 

2 Itnte 2a%of as "a decimal tunes ” ( )2 

3 l\nte m figures one thousandseven and four hundredths ( )3 

Cooperative General Mathematics Tests for College Students, Form 1934” 

28 Hon man) axes of si-mmetry does an equilateral tnangle 

have? ( ) 

29 Eignt IS what per cent of 64’ ( ) 

30 nte an expresaion that exceeds 3/ bj X ( ) 

31 Solve the formula V ^ for A ( ) 


Iona Placement Examinations, Chemistr) Training^ 

1 The atomic weight of X is 39, of Cl, 35 5, of 0, 16 
What IS the molecular neight of AC/Oj’ 

2 If 7 gm of iron mute mth 4 gm of sulphur, how manj gm 

of iron sulphide will be produced? 

Te«ts on Evendav Problems m Science, Unit XII” 

MTiat devnce is u«ed m a vacuum-cleaner to pump air mto 

the dust bag? (15) 

What 13 the ptesaure m pounds of ordinary air per square 

inch’ (16) _ 

An Exercise from a Biolog) W orkbook“ 

DiBLcnoNS As )ou locate each part usmg a hand lens on an actual specimen, 


” Ilenrj Daniel Rinsland ap at pages 23-222 

“Devised by C Stone and published by Bureau of Publications Teachers Col 
lege Columbia University 

“ De^«edby\S D Sonea and David P Harry Jr and published In ftorlJ 
Book Company 

*’ Devised by H T Lundbolm and L p Siceloff and published by Cooperative 
Test Service ^ J »■ 

»» Devised by G D Stoddard and J Coroog and published by Extension Division 
Slate University of Iowa 

“ Devised by C J Pieper and W L Beauchamp and published by Scott Foresman 
A Company 


« Pre;«r^ bv Arthur O Baker and Lewis H Mills to accompany 1 
/ lotoyy To-Iaj Chicago Rand McNally A Company, 194J 



1C9 


sri.cnic TYPES of objective tests 



170 THE COi\STRbCTION OF TEACHER-MADE TESTS 


4 The use of texthool language in icording the question should be reduced to ike 
minimum Unfamiliar phrasing will reduce the possibilitj of correct responses that 
represent mere meanmgle«s ^ erbal assooaiions and al'o will eliminate the tempta 
tion of pupils to memonze the exact language of the book 

5 The questions should be so icorded that there is only one correct response This is 
a standard nhich is difficult to reach Mnce pupils are man elouslj resourceful m 
reading into questions mterpretations which the teacher ne\er intended For ex 
ample, the question m ancient history, "Name tno ancient sports” ehcited this 
replj from an ingenious student “Antonj and Cleopatra ” This po«sibilitj would 
not ha\ e arisen had the question taken this form “ttTiat were two popular athletic 
contests m ancient Greece’ ' All acceptable replies which are ba«ed on anv legiti 
mate mterpretation of the question «hould receive credit, and inu«t be listed on 
the 'coring kej Extra care m wordmg the questions will save much time and 
trouble later 

C. Completion Tests 

Definition. The completion test may be defined as a senes of sentences 
in which certain important w ords or phrases ha\ e been omitted and blanks 
submitted for the pupil to fill in A sentence may contain one or more 
blanks The sentences m the test may be disconnected, or they maj be 
organized into a paragraph Each blank counts one point 

Advantages and limitations. The mental processes which the pupil 
must employ m supplying the responses required in completion tests are 
very similar to those required m simple recall tests, although perhaps on a 
somewhat higher lev el It is not surprising that the ad\ antages and hmita 
tions of these tw o types of tests are also similar The completion test has 
wide applicability, as far as subjectrmatter is concerned, but unless pre- 
pared wuth extreme care is bkelj to measure rote memory rather than real 
understanding, or it maj turn out to be more a measure of general intelli- 
gence or linguistic aptitude than of school achievement 

The scoring is likely to be e\ en more laborious than that of simple-recall 
teats This la n/it Gulv 

because the mi'singwords are written in blanks scattered all over the page, 
rather than in a column While these limitations cannot be entirely ehmi- 
nated they can be greatly reduced, as is evident from the illustrations 
below 


I Illustrations of Completion Tests 

Stanford Achievement Test, Paragraph Meaning, 1940 Edition** 

Directions [Abridged] Write JUST ONE TtORD on each hne Be sure to imU 
each armcer on the hne that has the same number as the missing word in the paragraph 

Terman andiub- 



Sl'LCIflC TVPLS Oh OBJECTIVE TESTS 


17 ! 


In olden doffl men mode their omi pens from the nudls 
oI fonthem It required conridcroMc shill to cut a pen properli 
rn ns to -uil one s indnidml taste in unting Students ncre 
iilm\s on the lookout for pood goose, sunn turlej, or other 
mrrl (enlhcM Coo<o quills made the most satisfactory — 1— 
for general 2 — , but schoolmasters liked pens made from the 
— 1- of tn an feathers heeaum they fitted best behind the car 


rulilic School Attninnient Tests for High School Entrance'’ 
3 Queilton Pj< 1 tin*? teim haxc a coach? 

an^ircT No tJir\ tnushi (3) lion topb> without any 

(31. 

J Q Did ftU of \ou have matches? 

/I Of roursri I.acli one Ind (J) ovm water proof box full (4> 


Tests of r\CT3da> ProWems in Science, Unit XI** 

A pn -pole IS an example of a machine called the (I I) 

A capstan »s an example of a mnchinc caficd the (12) 

V ►eren i« in example of a machme called the {}3> 

\our teeth ntx* examples of machtnes called (14) 

t txporj Tests m Ainertcm Jltstor>»* 


2 The man who headed the first expedition to nr 
cumnnifjate the plolic oas 
7 *1110 Articles of ConfCKlcration were m force fix m 
irsi to 

^ The ‘ Old 1 Jlicrtx Poll rang out the decision if 
Concres-s to lie free from Fnpland in thejear 

( oopcratixe rnfilish Test Senes 1 ** 

20 nte on the lines to the right the contractions — shortcnwl forms to leprcscnt 
liow the Monis are n itumlh Fjiokcn — for the fcicn groups of uords undei lined 
m the follon me sentences lormstancc fordonol ^ou would wnte don t \ou 
need not copj the sentence butonl> (he <cien contractions 

/ / flic read his ston but ? cantiot 

belicie that he tnll get a phasing 

grade on it for it is not well written 

and has nol b clear cut plot The char- 

acters are not at all interesting they are — 

not e\ en human 



** Devise lb> Henry D Rinsland am) Roland L Beck and publishi d b> Public School 
Publishing Company (The form of completion with all responses la s column mstea 1 
of Btaggered withiJ sentence- w as devised by Rinsland ) ^ ^ ^ „ 

**I^visedbvC J PicpcraudtV L Beauchamp and published by Scott Poresman 

^^^v^edhyC A <»«gory and published by C A Gregory Company 

»* Devised by Sterlii ti A l^eouard and others and published by Cooperative Test 



172 1HE COXSTRUCTIOX OF TEACHER-MADE TESTi 


II RuIef^ardStiggesltonsforConstrucliKn 
ol the suggestions made ior constructing simple-recall ifecU e.pGl\ 
equally well to completion tests The dangers to be a\ oided are largelj the 
same for both forms A few suggestions mar be offered, hower ei, tliat ha\ e 
special reference to completion items The mam problems of constructing 
completion te'^ts are three in number (1) How to phrase the statements 
«o as to indicate the igve of response desired , (2) How to a\ oid ginng the 
pupil imwarranted clues to the specific responses expected and (3) How 
to arrange the items so as to facilitate scoring The first tn o suggestions 
below applj to problem one, the next fir e applj to problem two, and the 
last fire suggestions are for problem three In short, a good completion 
statement ^rea a reasonable basts (or detenmmng the response desired 
without prorndmg vnu:arTanled dues, and is arranged to factltiale sconng 


1 Atoid tndefimU tlalemenla Tbe pupil is entitled to know the tvV ’ recnonie 
defied, and when this is done the <cormg is far more rapid 

EXAMPLE Abraham Lincoln was bom m — — — 

liFTTER Abraham Luicoln was bom m the state of 

The jear of Abraham Lincob’s birth ws* — — ■ 

Tbe fir*t statement fails to indicate whether the de«ired re«pon-< is the date, the 
place, or the circumstances of his birth In that form legitimate answers might 
be “Fcbniarr” or “1809,” for the date, “Kentucky, or po««iblj ' The South ' 
for the place, and ‘ por ertr or “a log cabin ” for tbe circumstances of his birth 
Pv a slight change m wordmg the statement is made qmte definite 

2 Armd ocermultlaled tlalemenis If loo maii> kej words are left out, it is im 
po»«ible to know what meanmg was mtended 

example The ■ la obtained b> dividing the 

bj the 

In its present form, it is impossible to tell whether the etatement refer® to edu 
'^tional measurement or to anthmetic 

RETIFTt 

1 The IQ is obtained bj diiidmg the b\ the - 

2 The IS obtam<xl b\ dindmg the b\ the 

duisor 

3 Omti key tcordi and phrases, rather than Innal details If this is not done the 
rf*«pon5e maj be as obnous as tbe fir®t example below, or as unne^essanb difficult 

1? p secc nd example 

EXAMPLES 

Abraham Lm/xilu wjs b. m Februati iS0t» 

e Abraham Lincoln Has bom m Countv Kentucky 

4 Af Old giaienenls direcUj from th^ (<-xi Hii’ pats too crest a premijm 

’-n -titememorj 


6 TVhen*zt7 possible arout ‘a or 
unnccestanlv Inn ♦ the responses that 


an’ tmme^juuely before a blank The«e words 
can hi> jj the blank 



SPLCtnc TYPES OF OBJECTIVE 'I BSPS 


nXAMPI n Mirj (ijckrft an . 
BnTTKIi Mnri ate the ^ 


off the tree and ate it 

6hc piched off the tree 

it !«! npp^n^nt that the worth ••pear,” "ocach ” ”nltim » » ..i 

pineapple ” and the like could not be u^cTm the ffret statcmeni Tri r 
cl,o,co ,™,h to nor™. ,io„„ ,„o fom.lt ™ 

•oconcl «t,(,nicnl contoms no fpcciBc determiner ™ ^ ™ 

I,,,*!, 'Y. f'"!!'* If ‘ho bhnU rnry m length the pnp.I 

'' 'l", "'VT')' ‘ f 0'n»ctc<I Elen mom of n cl„e ,> alforded bj 

ij'ing a dot or a da*}i /or each letter m the correct word ^ 

J XAMPIX 

1 Tlie *^000(1 president of the United States was _ from the 

state of --- 

2 The president m office during the Mexican War was from 

die state of 

PETTUR 

1 The Ecronil prendent of the United Steles n is ______ from the 

state of ■ 

2 Tltc president in office climng the Mexican P ar wn< from 

the slate of . 


Aioid jrawmotieef clues la the anneer etpeeied 

EXAMPLE The authors of tlic first performance test of intelligence were 
BETTER Tlie Jirst pcrfonnancc lest of intclhgenee was prepared bj 


8 Choose stalemenU in tchich there m onf(f one conrel respome for the blanks 
'Hie scoring w far more objectnc if on!j one specific word or phrase can be u^ed 
to complete the statement 

9 The required response should be a stnple vord or a brief phrase The more the 
scorer has to read the more time will be required 

10 Arrange the lest so that the anstcers are tn a column at the nghl of the sentences 
Tlic illustrations abQ%c show xanous wa>9 m which this maj be done UTien each 
sentence contains but a single blank, the scoring is mule easier if the blank comes 
at the end The Tests of Cxair} da j Problems in Science and the Gregory Te«ts in 
American Historj are examples If the sentences contain more than one bl ink the 
scoring is more rapid if the blanks are numbered and the pupil is directed to w nte 
his responses m the correspondmglj numbered blank in the answer column at 
the nght Rmsland” suggests that the following wording of the directions wall be 
clear to pupils aboie the fourth grade aithough it may be necee-=ar3 to exphm 
the word "correspondingly” m grades four to semi 

Directtioxs In each of the sentences below, one or more words, numbers or dates 
are needed in the numbered blank spaces to make the sentences complete and true 
Place the word or words m the cormpondmgly numbered blank to the right 

11 Prepare a scoring key uhick contains all acceptable answers Although it is 
desirable to have only one response which can be considered correct for each blank, 


Henry Daniel Rmsland op cil , page 56 


174 THE CONSTRUCTION OF TEACHER-MADE TESTS 

this IS not possible m all ca'es As a rule, a satisfactorj kej can be made bj wnting 
m red the correct answers on a cop> of the test 
12 Allow one point /or each blank correctly filled Avoid fractional credits and 
unequal weightmg of items on the basis of difficultj or importance 


D. Altemati\e-Respotisc Tests 

Definition. An alfemahrc-responsc test is made up of items each of 
which admits of only two possible re'sponses The usual form is the familiar 
true-fal^e test Other similar forms are nght-wrong, correct-incorrect, jes- 
no, same-opposite, and two-option multiple-choice 

Ad\antages and limitations. Ob\iOus adiantages of the altematue- 
respon=e test are its apparent ease of construction, applicability to a wide 
range of subject-matter, objectmty of sconng, and wide sampling of 
knowledge tested per umt of working time The true-false test, a form 
\ ery popular with classroom teachers, has been the object of more research 
and of more criticism than an> other form of objecti\ e test The negati\ e- 
suggestion effect and the factor of guessing are often pomted out as hmita- 
tions of this tjTpe of test RTiiIe the use of the correction formula appears 
to make a fairly satisfactory adjustment for guessing in the total score, 
the alternate e-re«poase is not well adapted to educational diagnosis The 
danger of negativ e suggestion when pupils see statements which are fal«e 
has apparently been o% ereslimated, but perhaps it is wi«e not to use true- 
false tests as pretests or with >oung children In such cases it is better to 
a\oid the alternative-response test, or to use a question that can be an- 
swered bj yes or no instead of a declarative statement 
Several modifications and alleged improvements of the true-false test 
have been proposed Barton,’^ for example, has suggested crossing out the 
part of the statement that is in error, while other studios” hav e shown that 
havnng pupils correct the wrong statements mcreases the rehabihty of the 
test Still others” have proposed that items be weighted according to 
the judgment of the pupil, or be marked true, false, doubtful All of these 
suggestions add somewhat to the labor of scoring and have not received 
wide acceptance Furthermore, stnctl> speaking, when these modifications 
are followed, the test is no longer of the altemativ e-response tjTie As a rule, 
the most obvuous way to “improve” the true-false test is also the best. 


« W A. Barton, Jr , ‘'Improyrng the True-FaLe Examination.” Sduwl and Society 
34 &44-M6 October 17, 1931 

*• Emeat E. Bajless and Ralph C Bedell ‘‘A Study of Comparative Validity 
Shown by a Group of Objective Test- ' Journal of Educational Research 23 8-16 
Jmu^, 1931. F D Ci^is, W C Darling, and ^ H Sherman, ‘A Studj of the 
Relative ' aiues of Two Modifications of the True-False Test ” Journal of Educational 
36 517-527 V^ch.lW3.W H E. Wnght, “The Modified Tnie-FaUe Item 
Applied to Testmg m Chenustrj,” SduxA Science and Mathematics 44 637-639, Oc- 
tober, 1944 ’ 

‘ ^ Method for Correcting for Gues«ing m True-Fal*e Testa and Em 
iS^ “ support of It JoufTinf of Soaal Psychology, 3 3o9-362, August 



175 


SPrclFIC TYPES OF OBJECTIVE TESTS 

tlmt 1 '?, mnh ihr ic^t longer and prepare tl more carefully At least 75 items 
arc dcsiralilc, anti 50 maj be sot as an absolute minimum, unless tlie test 
co\ ers a ^ cr> narrow range or is used for instructional purposes onb One 
athant igc of the true-fal'^e test is (hat it can co\er more items m the same 
time tlian an} other test t}po 

Should pujiils be ad\i*!cd to look o\cr tnic-falsc tests and change the 
answers on doubtful items’ ScNcral studies ha\c attempted to answer this 
rpicsJjon Ildl*’ made an evtensiic mt estrgation of the problem and came 
to t!)C eonelusion that tlicro is “not much adiantage to be gamed by 
clnnguig ones answers on a true fal«e test,” although the advantage was 
«omcw hat greater m changing from true to false than in the rev ersc There 
IS some cviflencc that (ho better pupils profit most from rechecking and 
revnsing their w ork L\ cn if the scores arc not alw a} s improv cd it is prob- 
abl} a good work habit to encourage 

J ho low esteem in whicli test experts hold the alternative-response tjpe 
of test cspccnll} the true-false form is indicated b} the infrequency with 
w Inch it has appeared m recent standardized aehiev ement tests This is due 
chiefl} to its weakness as an instrument of diagnosis and to the fact that 
such tests mu«t lie made much longer than other objective tests m order to 
secure comparalile rclialnhty Although tins l}pc of test has been greatlj 
ovenvorked li} classroom teachers, it docs have a legitimate, though re- 
stricted, U‘-c in informal tests For example, the true-false lest seems well 
adapted to testing tlic persistence of popular misconceptions and super 
stitions Ordinar} altcniativ e-test situations are encountered in which it is 
difTicult 01 impossible to make more tlian two plausible responses for a 
muItipIe-choicc test There are many troublesome situations of this sort in 
language usage Common examples include the case forms in pronouns 
correct use of singular and plural \crbs, confusions of past tense and past 
participles, the u«c of sd and set, lay and he, and many others A safe rule 
would be to rcslrxct the use of the allematne-response test to those situations 
to which other lest forms are tnappbcablc, and then to giie particular care to 
the wording of (he items 


2 Uluslraiions of AUcmaliie Response Tests 


California Achicxcnient Tests — Advanced Batter} Fonn 
Dir> cnoNS In the follow ing sentences mark as jou have been told the number 


of each correct word 


Test 5 — Section C 


30 ('IsD t ’Aren t) the baskets filled with flowers? 
47 I approve of (‘his *hun) going 



•• George E Kill The rffocl of Changed BwponseB m fme-False Teete Jmrnut 
Tfegtand aod puhoehad by Oaldoro,, 

Test Bureau 



176 THE CONSTRUCTION OF TEACHER-MADE TESTS 


For each statement giren beloic that is a complete sentence, mark YES, for each 
that IS uot mark 1^0 

51 ^Tien we approached the deserted farmhouse at night YES ^0 51 

56 The mountains resounded with peals of thunder which mdi- 

catetl the storm s fun YES 56 


Iowa Silent Reading Tests, Jvew Edition Sentence Meaning 
Elementary, Form Am” 

Dipections Read each question If the answer is “Yes,” fill m the space under 
YES m the margin If the answer is fill m the space under IsO Studj the 

sample Do not guess 


1 Is a dime less m \alue than a mckel’ 

2 Can we see thmgs clearly m a thick fog^ 

3 Is geograph\ studied m public schools^ 


YES 

>0 

YES 

NO 

YES 

NO 


‘Ulport-\ emon Lindze> ‘ Slud> of Values ® 

DiBEcnoNs A number of controvemal statements or questions with two altema 
tire answers are giien below Indicate! our personal preferences bj wnting appro- 
priate figures m the boxes to the nght of each question For each question 
} ou hai e three points that } ou mai distnbute m an} of the folloinng combinations 
If !ou agree with altematiie (a) and disagree with (b), write [3 in the first box 
and 0 in the «econd box] 

If lou agree with (b), disagree mth (a), write (0 m the first box and 3 m the 
«ei«nd box] 

If lou haie a •“hght preference for (a) oxer (b), nnte [2 m the first box and 1 
m the second box] 

If >ouhaie a gbght prelerenccfor (b) oier (a), write [1 m the first box and2 m 
the «Kjcond box] 

1 The mam object ot scientific research should be the discoierj of truth rather 
than its practical applications (a) Ves, (b) No 



10 If jou were a umier«it} professor and had the necessarj abiht}, i\ouJd jou 
prefer to teach Qa) poetr> , <b) cheioislr} and phisics'^ 



Tests m English Fundamentals Grammar” 

Dmre^oxs Oassif} the itahciied words m the ’^rntences below as adjectnes or 
adverbs by placmg check marks in the proper columns 


” r* publLhed b^ orld Book Conipam 

^Iport Philip E,\ emon and Gardner I mcizei and pub- 
uhed by Houghton Mifflin Company, \%o\ 

•' Deviaed by It Davis and published bv i^inn and Company 



SPLCIhlC 1 ypLS or osjictive tpsts 


177 




Ad;cc{ne 

Adierh 

3 Tliat was a ffifO remark 

3 



G Tho<5c flowers, smell wreci 

6 



U \ ou can Aard/i/ expect him to w ail 

n 

; 



Hie lotta Lvefj Puoil Tests in Basic Skills** 
iJinrcnos's In each of tJie follmwng 'ententes there arc two or more numbered 
wonls or phn'cs mclo'wl m brackets If >ou think llie ^rsi word or plira^c is 
correct place an \ in the firtl box of the corresponding row on the answer sheet 
If 'Oil think the fccon I answer is correct place an A in the second box of the 
projior row, etc 


fed 


[2 anj 


indiistrioui man 


f 1 has 1 
5-1 M> father S > 

1 2 hasn t j 

02 I want C'trjonc to help 


Qo inonc> 

1 himself I 

2 themselves j 


Coopemtivo Pkine Geometry Test Revised Senes Q*' 


DniEcTioss Reid the<c sbitcincnte and mark each one in the parcnthc'ies at the 
nglit with a plus sign (+J if 'ou think it is nlwnjs true or with a zero (0) if )Ou 
tl ink it IS alw a) s or sometimes fal^* 


I Tlic opposite angles of a j ar 

ailcloi,rem arc poual '( ) 


2 A diameier of a circle dividt" 

the Aircle into two equal part* 2\, ) 


17 If two triangles ore similar 
their areas arc m the same 
ratio as the medians drawn 

to corresponding sides 17( ) 

18 All similar polygons ore 

equilateral 18( ) 


Tests on r\cr>day Froblenis m Science bnit III*’ 

DnivcnoNS There are 2o incomplete statements in this lest each follonerl b> 
parts (a) fh) (c) and (d) One or more of these parts or perhaps none of them 
correctlj complete the incomplete statement You are to place a plus sign in 
the parent! eses (near the right margin) opposite each part which correctly com 
pletcs the statement and a minus sign (”) opposite each part which does not 
correctly complete the statement 

Devised by II A Gree e and published by Extension Division State University 

of Iowa 19T9 „ . - J UI U I L XL r-. 

*' D«,vi«ed by Emma Sjj nucy and L P Siceloff and published by the Cooperative 

Test Service . j . i l i , « 

Devised by C J li perajJW L Beauclianip and pubJisaca by Scott lorcsman 
& Company 



178 THE CO^STRUCTIO^ OF TEACHER-MADE TESTS 


13 Minerals in our food Eunplj 

(a) funush heat and energj to the bod> ( ) 

(b) are the onl> materials of which cells can be built ( ) 

(cl are good regulators of certain of the bodj actmties ( ) 

(d) help particularh to build bone and blood ( 1 


Cooperative Solid Geometr\ Tests” 

Directions Read these statements and mark each one m the parentheses at the 
nght with a plus 'ign (+) il jou thmk it is true, or with a zero (01 if jou think 
it IS fake, wholh or m part 

4 An\ number of plane« maj be passed through a gi\ en straight hne ( 1 

27 Two planes parallel to the same •straight hne are parallel to each other ( ) 

41 The square of a diagonal of a cube is three times the •square of its edge ( ) 

George Washington UnivereitN English Literature Test” 

T F 1 "II Penseroso” describes the charms of a merrj social life 

T F 4 ‘Tilgnm’s Progress” is one of the greatest pro«e allegones m 

hterature 

T F 8 In his poem “The Bells,” Poe describes the proce«s of makmg bells 


11 Rules and Suggestions for Construction 
The true-false test is often thought to be one of the easiest tj’pes to 
prepare This superiority is more apparent than real, howe\er Experienced 
test tnakers are convinced that no test form demands greater skill Unusual 
care must bo exercised in wording true-fal«e statements so that the content 
rather than the form of the statement xnJl determine the response llie 
aim should be to phrase the statement so as not to make its meaning need- 
lessly obscure on the one hand, nor to provide unwarranted clues on the 
other Thus balance requires a delicate skill of adjustment that is rare 
among makers of informal tests The following specific suggestions may be 
found helpful m constructing true-false te^ts Many of the suggestions for 
constructing multiple-choice tests that are found m the next section are 
also applicable here 


1 AcouiepeaM determiners U has been found that etronglv worded statements 
are much more lAel\ to be fal«e than true, while moderateh w orded statements are 
much more hkeh to true than faUo Examples of the former are tho«e con 
!?T®i fi’ ne\er,”‘no,’ ‘none,’ ‘ nothing, ’ and the hke, examples 

oI ftc biter are tho-e cnntaming ‘'naj," sometimes," "often,” ‘ a» a 

Z, ' ' T ‘>'= Ptoporlton of true and faUe 

d^SLTet' "7 expression, that expression ceases to be a specific 

deternmier tliat affords a clue to the answer 

of otter (rue orfalte slalemmls Since «ei eral 
lhe ree« ^l-e statements are more xalij than true statement. 

thesngeee tion is sometimes made that the test should haxe more faLe statements 

»”d ether, and publi hed bj Cooperative Te-l 
Oran-aVe and other., and pubb bixl b. a„ter tor Psjcbolop -J 



179 


si’i.ciric lYPi.s 01 ouncnvE tests 


linn true If th.^ ncre senoMlh done, honcicr, the raliclitj of the false atatc- 
ments noula prolnlili he r« uc^. since the mipil nould then tend to mark nil 
douhtful Etalcmmls Msc Tell the students that oppronmalelt/ half of the state- 
ment? arc true, the other hnU 


3 Arojd Vr rxaci l/iriffmffe of the textbook Lifting true Btatcmcnte directlj 
from the texthook, or making fal^ statements by changing a single word or ex- 
prr^tiion outs too great a premium on rote memorj’ 

1 ,fltotd tricA rfnlcmcnt^ Tlie<c arc usually statements n Inch appear to be true 
but ttJucJi arc rcilh fabo bocau‘*e of some inconspicuous word or phrase 

1 1 "Tlio Haicn" «as RTittcn bj Edgar Allen Poe 

2 Tlio battle of Hastings was fought in 1060 d c 
JinTTI n 1 "The ILaacn" nas unltcn by Edgar Allan Poe 
2 Tlie battle of Hastings naa fought in 55 b c 
\!«ii aaoul "double-headed ’ statements like the following one (espcciallj if they 
arc parlK true and part!\ fal^el Pot wrote ‘The Gold Bug" and "The Scarlet 
I-rtter ’’ 


S Ato\ti douWe ivgaUrr^ .Such statements ire espccjally bad, since pupils well 
Ncrscd in I nghsh grammar might coiicliide that two negatnes equal an affirmative, 
while other pupils would interpret kuth statements ns emphatic negatives 

G Aroid flm6iguou« rtrttcmen/i V\ ith one interpretation the Statement may be 
true and with another cqiiillv plausible interpretation it may be false It is im 
|KK«ibIc to tell what n being mea«urr<l when a statement has more than one 
legitimate mterpreUtton 

7 A t Old unfamiliar, fguraUre, or literary language TIic cxpenence of the learner 
must be considered A statement is badlv worded when a pupil who understand* 
the point inxohcd misses it because of the language employed 

S Atoid long tUtUmenU, espcaoVy those trui^nng complex sentence strueiure 
Same rca'«on ns for the preceding suggestion 

9 Atoid qunhUiiiic language ichercver possible Quantitative language conveyv 
more exactly the meaning intended Expressions such as "few," "many ' large 
"small,” "old,” "young,” "important,” "unimportant " arc vague and indefinite 

10 Ecijuire the simplest possible method of tndicatini? the response Instead of 
requiring the pupil to write True and False or Yes and No, let him write T and F 
r and N, or underline the correct response The symbols “*f ' for true and 0 
for fal*e are so distinct as to make sconng still easier t^Tien the pupil must choo r 
between two words or expressions, the responses should be numbered bo that they 
can be indicated by wnting the correct number 

11 Indicate btj a short hne or by ( > ukere the response is to be recorded The 
responses may be arranged in a column at cither the left or right of the statements 
Most scorers prefer the answers at the right 

12 Arrange Ike statements in groups There is some advantage m scoring if the 
items are arranged in groups of five, with double spaemg between each group 


E. MuItiplc-Choicc Tests 

Definition. A multiple-choice tret i" made up of items each oi 
presents two or more responses, only one of whu h is correct or definite y 



180 THE CONSTRUCTION OF TEACHER-MADE TESTS 


better than the others Each item ina> be m the form of a direct question, 
an mcomplete statement, or a word or phrase This form of test is to be 
distinguished from the muUivle response tjTie, hich requires that two or 
more responses be made to a single item 

Possibilities and limitations. The multiple-choice tj^pe of item is 
usually regarded as the most \aluable and most generally applicable of all 
test forms Lee regards it as “one of the best means for testing judgment 
that IS available Lindquist asserts that it is “definitcl} superior to other 
types” for measunng such educational objectiv cs as “inferential reasoning, 
reasoned understanding, or «ound judgment and discrimination on the part 
of the pupil Cronbach** regards it as being practical!} free from “re- 
sponse sets,” the tendenej for examinees to select a gT\en option position 
more often than would be predicted on the basis of chance alone 
One study” suggests fourteen types of questions which maj be asked 
in multiple-choice test items The hst is not all inclusne and does not 
intend to prescribe the exact language to be used but seix es as a guide 
in fonnulatmg the questions 


1 DeSnition 

a What means the same as ’ 
b What conclusion can be drawn from f 

c Which of the following statements expresses this concept m different form? 

2 Purpose 

a What purpose is «er\ ed bj ? 
b What principle is exemplified bj ’ 
c Wh> 13 this done’ 

d What IS the most important reason for f 


3 Cause 

a ^Tiat 13 the cause of 7 

b Under which of the following conditions is this truf*? 

4 Effect 

a 'VSTiat is the effect of 7 
b If this IS done, what will happen’ 

C Which ol the following •hould be done (to acliiei e a given purpose)’ 

5 A«sociation 

Snl'mlh ° "'T “ ““Mrtron (lempoml eau^l, or concomitant a«o- 


correvt forms ^ EPEli-h usage and spelling tests to nave several 

S7h“lem. which la to be cho-eo in 

sNewlorl. 



SVLCH’IC TYl'r't or OHJIXTIVE tests IS] 

b Hecoj^nitjon of I'rror 

]\ Inch of tl,c fo:i„w.,.B constitute «n error (mil, report to n given situ- 

7. Mcntificntion of Error 
ft \\ lint kind of error 15 tins? 
b hit IS the n imc of this error? 
c W hat recognized principle is viohted? 

8 Exaluntiou 

m'™?* ‘ K”‘" f'lmooel and for wliat 

0 IbfTcrcni'O 

W liat is the important difference Wtnecn J 

10 bimjhnt) 

\\h'it IS the important ptmilarit> between 7 

11 Arnngement 

In (he proper order ('to nehicxe n pi\en punioHe nr lo lollow a guen rule' 
s^hl<.b of tht. following conn? first (or hsi or fi>]I,«rs 1 gnen iteni>7 

12 Incomplete Arrangcinent 

In tlie proper ortli r, nlneh of tht folloumg ».h«>uM be wrert^l here to com 
picte the fcnes’ 

1 1 Common Pnnciplc 

All ol the following items ixeept one nn rclitcd bj a common pnnciole 
ft Wliat is the pnmiplc* 
b IMiich Item does not belong? 
c MTiich of the following items should be substituted? 

1 1 Controscrsial Subjects 

Although not e\cr)onc agrees on (he deairabihtj of tno«e 

who support its desirabihtj do so primarily for the reason that 

Unusual care must be exercised in the construction of multiple-rhoice 
items in order to a\oid the inclusion of irrelevant or superficial clues, and 
to insure that the tests measure something more than the memory of factual 
knowledge The value of multiple^hoice teste in diagnosis depends upon 
the skillful selection of the incorrect choices presented m the items 


7. Illustrations of MuHtple-Choice Tests 


The items below, taken from standard tests, illustrate several different 
arrangements of multiple-choice tests in a variety of subjects This type 
of test IS wndely used m all school subjects, on all educational levels, and for 
measuring a variety of teaching objectives 
Special attention should perhaps be called to two of the illustrations, 
both of which are suggestive to teachers in making informal tests The 
Nelson High School English Test illustrates the possibility of testing punc- 


« E»i 3 Weitzman and ^ alter J McNamara, “Apt Use of the Inept Choice in Multiple- 
Choice Testing ’ Journal of Educational Rueardi 39 517-522 March 1946 

** These tests are not all equally good bovwever The reader will note that «ome oi 
them are rot whoUv consistent with the principles set forth in this chapter 



182 THE CONSTRUCTION OP TEACHER MADE TESTS 

tuation with a minimum of sconng labor The Cooperative Test of Social 
Studies Abihties shows how objective tests maj be used to test more than 
the memori for factual knowledge This is a good example of a test of the 
pupil’s abihty to mterpret facts— an abihty which is an important aspect 
of thinking 

Kuhlmann Tmch Intelligence Tests, Test It v 

13 Early is to begin as late is to 

1 2 S 4 5 

start end awake enter prompt 13 

22 Flour IS to bread as sugar is to 

1 2 3 4 ^ 

sweet candj fruit cook eat 22 — — 

The Modem School Achie\ement Tests Language Usage" 
Directions In each sentence, choo«e the word or group of words that make the 
beat sentence Then on the dotted hne at the right copj the number that is before 
the correct form 

1 off 

4 I borrowed a pen 2 off of my brother 
3 from 

1 your 

7 E\ ery student must do 2 his best 
3 their 

1 has got 

17 He 2 has his moIu with him 

3 has gotten 

The Barrett-Rj an Literature Te«l Silas Maraer" 

A ( ) An episode that advances the plot IS the — 1 murdermg of a man 2 kid 
napping of a child 3 stealing of monej 4 fightmg of a duel 
B ( ) Doll} W mtbrop is — 1 an ambitious ^ociet} woman 2 a fnvolous girl 
3 a haught} lady 4 a kmd helpful neighbor 
C ( ) A chief characteristic of the novel is— 1 humorous passages 2 portra}al 
of character 3 historical facts 4 fairy element 

Wesle} Test m Political Terms* 

1 An embargo is 

1 a law or regulation 2 a kmd of boat 3 an explorer 

4 a foolish adv enture 5 an embankment 1 ) 

2 An injunction is a 

1 part of speech 2 wreck 3 umon of two thmg« 

4 court order 5 form of adnee ( ' 


tion^TSB^JcTu'* ^ “dG L Bell, and published by Educa 

TeShmS^e EmlL™ “ ^ Scbrammel aud publcbed by 

•• Devued by E. B Weslej and published by Charles Senbner s Sons 



183 


!>pi:ciFic Tri’i:s or oisjective tests 

Unit lie. of Attnmmuit in loods and Household Mamgement" 

2 The Spoon should (le phcc<l 

1 the top a{ the phtc 

2 at t!ie left of the fork 

3 in the spoon holder on the table 

4 nt the right of the knife 

40 c get the niwil c ilones {ler pound from 
1 protein* 2 carboh>dratos 
3 fills t minenl milter 

5 Mtiimin* 

Traxler Silent Reading Teat, T\ ord Meaning" 
i 1 he eomnundolion is dcsencrl 

(1) succe*? (2) blow (3) populantj (4) gocxl fortune 
(5) pMice 

9 Hi* aetion* rctci\cd condemnahon 

fl) approval (2) appbu'c (3) ceasure (4) sjmpalhy 
(5) contempt 1 


Cooperatne French Ttsl, Junior Form lOSG” 

2 Quand on tous pore unc question, il faut 

I r^pondre, 2 «e laire, 3 ec samcr, 4 tourncr le do®, S baisser la 
tile ( / 

7 Gcllc <hmc cat mi grand'mirc, jc 6UI5 

1 son fib, 2fonnc\cu, 3renfr6rc, 4soncou'*iu, 5 son petibfi!-. t } 

IG J’fti deux frircs, Jean ct Paul Jean a sept ans Paul en a treite et znoi 
I’aidouzeins Qui cst Ic plus jeunef 

1 Jean, 2 Paul 3 moi, 4 dix ans, 5 les deux frJre® '' ) 

>clson High School English Test” 

Dmecnosa Some of the sentences contain errors in punctuation, some of them are 
correct 3f jou think some murk is not needed, cross out the letter indicating that 
mark under the nord “Omit '* If you think some additional mark is needed erws 
out the letter indicating that mark under the word "Add ' If j ou think the exercise 
IS correct, cro *<5 out the letter r Ke) a— apostrophe, c — comma, d— dash e — excli 
matioD pomt, h — h>phen, p — period q — quotation mark, s — semicolon 

Add Omit Right 


1 You must elect a chairman, three 
judge* and an official timekeeper 
0 He said ‘ that either vou or I mu*i 
po ’ 

S The car n hich John la driving is a 
new one 

14 “Well I think highly of them Mnrv’ 
I “aid 


% h d s q T 

c s d 9 X >■ 
d q s c d X 
0 h 8 y P r 


'iDeii"ed bv Ethil B Bwve and Clara M Btoi™ «ad poblisbed by Ediiotioiul 


Te*t Bureau Inc 

Devised by Arthur D Trailer 
” Devised by Jacob Greenberg 


and published by Public School Publishing Company 
and Geralduie Spaulding and puHi^hed by Coopirn 


"»Seraed"bT\t 1 aoj puUubed by Houghloo M.lIIm Company 



184 THE COVSIRICTION OF TEACHER-MADE TESTS 


Cooperati\e Te&t of Social Studies Abilities, Experimental Form Q“ 
IMERPRETING FACTS 


DIHECTIO^s Tlie exercises m this part consist of a «enes of paragraphs earli 
followed by sexeral statements aliout the paragraph In the parentheses after each 
statement, put a 


1, if the statement is a reasonable mterpretation, fullj supported hi the facts 
given m the paragraph, 

2, if the statement goes bejond and cannot be proved bj the facts given in the 
paragraph, 

3, if the statement contradicts the facts given in the paragraph 

[The sample exercise and the explanation are omitted ] 

I. The nineteenth century vvitnes ed a rapid growth in Germany’s mdu-tnal 
power Like England Germanj came to have a fairlj satisfactory balance betw een 
the amount of its export and import trade Heavy exports of coke supplied full 
cargoes for ships to foreign ports and helped to balance heavy importations of raw 
materials The imports especially provided a means for distributing freight rates 
to the advantage of the German trader competing overseas By these means 
Germany was constantly obtaining lai^er portion-* of world trade German wares 
were earned into every trading realm, and trade meant political a« well as com 
mercial power m foreign lands 


!• Through growth m foreign trule Geitnnnv s industrial powei in- 
creased in the nmeteenth centurv i( 

2 Germany had an export trade eipial m volume to that of England 2( 

3 Germany exported verv little coke to foreign countries 3( 

4 England was unable to balance the tonnage of her import and 

export shipments 4 ^ 

5 By reducing freight rates Germanv was constantly gamine a greater 

percentage of world trade 5 ( 

6 The Mle of German nares in eve^ part of the world resulted m 

aciuea political influence and commercial growth 6( , 


II Rules and Suggeshons for Construction 
'Hie purpose of suggestions 1 to 5 below .s to avoid unwarranted clues 
the desired response, the purpose of suggestions 6 to 9 is to encourage 
responses on a high intellectnal level and the purpose of suggestions 10 to 

14 IS to make the scoring s simple and rapid as possible 

lerb IS raamlenl For example, if the 

n r o^el sotd (in ‘ 

tor^ is^mo>rlM“r^Und Skdt sWemmia The question 

■' ""'.'.Wone c„„,s.„„v, T.., Irrv. . 



SVhCIFIC TYPES OF OBJECTIVE TESTS 


■1 Aroia usinj tn the comet reepimse He same teords or phrases (fail ^c-vr in the 
question or incomplete statement 

5 Arrange t/tc responses so t/iatlAe correct one occurs in random order Viepjpih 

arc Ijkelj to detect an} regularlj recumnK pattern m the sequence of response 

6 Male all responses plaustdle In phrasing multiple-choite test items, con 
•sideration sfiould be gi\cn to the fact that the answer maj be am\ed at by ehmi 
noting the incorrect ro«pon*:cs as ^ell as b} selecting the correct response direct)} 
The mm should be to make eacli Eugge«te<l re«pon«e so plausible as to tempt pupil'' 
uho haie onl} superficial knonledge of the point invoKed The plau'sibiht} of in- 
correct rc<sponecs ma} be increased b} u«ing familiar, stereotj'ped or textbook 
pliraseolog} , or expressions xerj similar to those m the que«tion or incomplete 
statement 

7 At least /our choices should ht presented uheneier possible Increasing the 
number of plausible choices tends to reduce the guessing factor Horst" found, 
ho\ic\cr, that when the incorrect responses arc of equal difficulty the chance ele- 
ment is lc«s than when the choice is among a greater number of responbcs with a 
wider range of difficult} 

8 Jn testing /cfT the understanding o/ a term or concept the term shouid usually 
be presented first, foUotrea by a senes of definitions or descnplions from tckich Qi^ 
choice t« 10 be made If the order is re\cr«cd so that from a «enes of terms the choice 
IS made of the one that best fits the definition or dc«cnptu e statement the selection 
frcqucntl} can be made based upon superficial xerbal associations and not upon 
genuine understanding 

9 To measure the higher letels of understanding increase the komogeneily of the 
options prowled The following illustration from I indquist^ shows how the degree 
of required discnmmation increases with the bomogeneit} of the responses pre- 
sented 

A Engel s law deals mth 

1 the coinage of monci 

2 the inevitableness of socialism 

3 diminishing returns 

4 marginal utility 

5 famil} expenditures 

B Engel s law deals mth faun!} expenditures for 

1 luxuncs 

2 food 

3 clothing 

4 re«t 

5 necessaries 

C According to Engel s Jaw, famil} expenditures for food 

1 increase in accordance with the size of the family 


2 decrease as income increases 

3 require a smaller percentage of an increasing incoms 


4 rise m proportion to income 

5 vaiy with the tastes of families * r i 

To respond correcth to A all that is required ts the knowledge that Engel s law- 
deals w^thlmil} expenditures In JS a knowledge of the pecific item of e^^nditure 
"s noJars The maxunum degree of doiormanatioo, however, .s required m C 


where still more information is given 


»» r-iul Horst The Difficultj of a MulUple-Choice Teat Item 
Uonal Psychology 24 229^32 Mareh 1933 lir»nn oa ext 

“ Herbert E Hawkes E F Lindquist and C R Mann on ext 


Journal o] Educa 
pages 14&-147 



186 


Tlin CO\SJnUC'UON of teachijr-made tests 

10 Require the simplest pnssibk method of indicating a response 1 u‘?u ill\ 
means that the responcos are lettered and the choice is made l)\ mdic iting the 
letter of the response In the fiist two or three grades wlieie key lettera may 
not be understood, it will ho better to permit the more natural resiionsf of under- 
lining the correct answer 

11 Indicate by a short /trie or Ij ( ) uhere the respond is to h< recorded or, better 
still, use a separaip ans ver sheet 

12 Arrange the items in groups Asa rule, groups of five will bo suitahk although 
other numbers of items may sometimes Iw better Double space between eicli 
group 

13 Use the “correction for rhance formula (page 156) if the number of choices is 
fewer than six If there arc tux or more responses suggested for each item the gam 
m validitj IS seldom sufficient to warrant the labor of making coircctioiis for chance 

14 Group together all items with the same number of choices This is e-'jicriallj 
desirable when the correction formula is to be u«td 

r. Matching Tests 

Definition. A matching test typit illv consists of tw o columns, each item 
111 the first column to be paired with a word or phrase in the second column 
upon some basis suggested In the simplest form of matching test the num- 
ber of responses is exactly the same as the numlx r of items Frequently, 
matching tests are made which rrovidi inoie responses than are required 
Sometimes the items m the first coluimi urc uicomplcte sentences, each of 
which requires a word or phrase from the sec ond column for its completion 
Occasionally two, or even more, columns ol k spouses are given, from each 
of which a choice must be made foi cadi item ni the first column The 
matching test is also useful for identitying numbered places or parts on 
maps, charts, and diagrams 

Advantages and limitations, llierc aie many types of leaining which 
involve the association of two things in the mind of the learner Common 
examples are the following Events and dates events and persons, events 
and places terms and definitions, foreign words and English equivalents, 
laws and illustrations rules and examples, tools and their use, and the like 
The matching test is a very convenient form of exercise for measuring such 
learning In the words of I indquist ‘ The matching exercise is parti^'iilarly 
well adapted to testing lu who, what when, and where types of situations, 
or for naming and identifying abilities 
Its pnncipal limitations are as follows (1) U is not well adapted to the 
measurement of understanding as distinguished from mere memory, 
(2) With the exception of the true-false test, the matching test is the form 
most hkely to include irrelevant clues to the correct response and (3) Un- 
less skillfully made, it is time consuming for the pupil The suggestions 
that follow arc designed to overcome the last tw o limitations f he matching 
test can hardly be designed to measure genuine understanding of a high 
level or the ability to interpret complex relationships 


iil'LClh'lC 7‘YPr.S or objective tests 


I lUusIratimt of Motch’ng Tests 

nic fnlloiring examples from standard tests iJluslrafe different me- 
cimmcal arranRements of matching tests in a variety of subjects 


r.vcj} Pupil Test in /'lijErcs*> 

Omvvord 'l)„« n '1 '"‘I' ""7 ” '1‘^r.pl.on Tliin select from the Ansacr List 

'-ri "* oa the dotted ime in front of the 

cfinition Tlie nnsner to the sample is (I’oaer), sn IB i, nntten on the dotted line 
.\ns\\fi; I isiT (ArnncCfJ nljjhnhetjcalli) 


1 Adlic'ioii 

2 CVntnfugnl 

3 CcntnpctiJ 
•1 Cohc^'Kif! 

.j Concfiiction 
C C'oriductor 

7 Conxcction 

8 I)cr)«iiJ> 

9 rnicicnc) 


10 rwrR) 

11 licit of ru?JOO 

12 licit of 

^'flI»on23tion 

13 Inertia 

14 Instihtor 

15 Kinetic 

10 McchimcaJ 
AWiantiRc 


17 Potential 
IS Poner 

19 Iladjation 

20 Relative Humiditj 

21 Specific GraMty 

22 Specific Heat 

23 Surface Ten5ic)D 

24 IVork 


1C SAjirtr The rate of doing nork 

1 \\ eight per unit % olumc 

2 Mutual force of attraction betneen like molecules 

3 Tendcnc\ of a borh to rc«ist any change in its «tote of rest or motion 

4 Tcndcnci of furfice of i liquid to contract is much as pos'sible 

5 Cap icit} fordoing work 

C T lie mtio of re*i«tance o\crcome to effort exerted 
7 The product of a force and the distance through which it acts 
S Ratio of output to input 

9 The cnerg) a bod} pa'«c<«es bccau<c of its po-Jition 

10 The number of calorics required to melt one gram of a substance 

1 1 Amount of «atcr*\apor the air holds compared to irhat it could bold at 
the name temperature 

12 Transfer of he it from a hot to a cold body by molecular collision 

13 Transfer of heat by means of ether waves 

14 The force pulling the body toward the centre of rotation 

15 A substance that conducts heat or eJectncitj xery poorly or almost not 
at all 


Cooperative Test of Social Studies Abilities Experimental Form Q«» 


DitiEcTiois In which of the sources listed m the left-hand column would you 
look first to find the items listed in the nglit-haod column’ Consider each group 
separately Put the ntimlier of the best source in the parentheses after each item 


1 Atlas 

2 Current Uutory 

3 Dictionary 

4 Pconomics textbook 


51. A discussion of an important present- 

day issue ID Congress 51 ( ) 

52. The location of the ten largest cities m 

the world 52( ) 


Devised by T W Brown and ofhers and published by the Ohio State Department 
Cduintion 1030 , ,, u i t ... 

• b> J Uajiit 11 nt.hfi'forM* «« I p«bli«*t J b> Cooper itivp T»j.f Sfrviri, 



18S THE CONSTRLiCTIOX OF TEACHER-MADE TESTS 


5 Enciclopedia 53 How to hyphenate the word ancmn 53( ) 

St Amendments to the Constitution 54( ) 

33 A di>:cussion of standards of Ii\ing 55( ) 

56 The population of a particular small 

town 56 ( ) 

1 American history 57 List of neas dispatches on CCC ac» 

textbook tnities 57( ) 

2 Book of quotations 58 A «:hort account of the early history of 

3 Library catalog ^lanhattao Wand 5S( ) 

4 ^allonal Geographic 59 The author of B ca/Acr tn Me Street 59( ^ 

Magazine 60 Information about the grot\ th of sla\ 

0 A eic Yori. Ttmee ery m the United States 60( ) 

Index 61 l\Tio said, ‘ Brenty is the soul of it ’ 61 ( ) 

62 Pictures and story of recent develop- 
ments m the T\ A 62( } 

1 Daily newspaper 63 The Pulitzer Prize awards of 1930 63( ) 

2 Readers’ (ititde to 6t Today’s price quotations on stocks and 

Periodical Literature bonds 64( ) 

3 Time 

4 World Almanac 

5 Librarr catalog 


Cooperatn e I rench Test Junior Form** 

Directions Each of the Engli«h «enteoces and phra«es below is followed by a 

translation m which there is a blank mdicated m this way ( ) The translation 

will be correct when one of the five numbered wonis phrases or endmgs listed at 

the left of the group is m«erted in the blank ( ^ Decide which of the fiv e items 

will make the translation complete and correct and put its number m the paren 
the«e8 at the right hand edge of the page 


n’ 


1 

ce 

29 

The«e books 

( ) hvres 

I 

2 

ces 





3 

cet 

30 

That school 

( ) 6cole 


4 

celles 





5 

cette 

31 

That money 

( ) argent 

( 




Vlll 



1 

qm 

38 

llTiat are they a®k 



2 

quoi 


mg for^ 

( ) demandent-ils’ 

( 

3 

quelle® 

39 

Who came down the 

( ) est descendu 

4 

que 


firsts 

le premier? 

( 

5 

qu 

40 

WTuch roads are the 

( ) routes ®ont le® 





be®t’ 

meilleures'^ 

( 




XIII 



1 

-se 

50 

They lighted «e\eral 

On a a\lum6 plu'sieurs feu 


2 

-es 


fires 

( ) 

( 


“ Devised bj Jacob Greenberg and Geraldine Spaulding and published by Coouera 
tive Te«t Service 1936 



1 

t -9 

5 No 
ending 
nwiwi 


s/'/ 1// It Dm 01 

'll 1 didn i blij tlic 
other books 
52 He bid binck hair 


olijuilvi jLsn 

Je nai pas achetfi les autr 
( ) livrcs 

II a\ait Ics cheveu ( ) 

noirs 


ISJ 

( ) 
( ) 


Sones llanj High Sclool Aclnercment Test” 


SVCTION G iMiTIIEMATICSj 
niPORTWT TIIFOKEMS IN CEOMETRTt 


O/nrenoNs /n tho pircnlfic<sc 9 nflci 
Cnlutim J nnlc (he mimbcr of the rccii 
CoLUJtS 1 (R> SILTS) 

1 angles eoiial 60 

2 tmnglcs congruent 

<1 tnangles «imilar 67 

•1 lines jxjrpcndicuLir 08 

5 lines parallel 69 

f> qundnhtcral w a parallefo- 70 
grim 

7 paraltclognm is a rectangle 71 
S two arcs equal (m same or 
equal circles) 72 

9 two chonls oquil (in fame 73 
or equal circles) 

10 areas of poKgons equun 74 
lent 


• each geometric condition tiven below m 
ilLs in Column 1 that could be pro% ed b\ jt 
CoLVus 2 (Cosomovs) 

If two opposite sides are equal and 
parallel ( jgg 

If perpendicular to the same line ( )67 
If the «ide3 arc proportional ( )68 
If thej ha\e equal arcs ( )69 

If 6ide-angle-side equal side-angle- 
fiide rc«pcctnelj ( )7o 

If the> are parallelograms with 
equal ba^es and altitudes ( )71 

If their central angles are equal ( )72 
If a tangent is drawn to the radius 
at point of contact { )73 

If corresponding parts of congruent 
tnangles ( )74 

If one angle is a right angle ( )7o 


II Rules and SuggesUons for Conslruclion 

1 he puqjo'O of the /ir«t three suggestions is to a\ oid irrelevant clues and that 
of the remaining hie is to reduce the amount of time required to take the test 

1 inefude onlj / omegeneous nuUenal tn each matc/iing ezerctse Do not mix in a 
‘■iiigle test such dis'imilar associations as persons and events dates and events 
terms and definitions Put short titles at the top oi both columns to describe the 
contents accuratclv For example Column 1 Eients Column 2 Dales 

2 CJccl cnrA exerciH cnTefuIli/ for unicarranied clues that may indicate matching 
pairs Tor eich item ask \ ourself this question What is the least amount of 
information th it must be known in onler to select the right response’ 

3 Axoid making the test too easy The difficulty of a matching exercise may be 
increased bj including more responses thin needed and bj using some of the 
lesponses more than once m the same test 

4 One list should consist of single uords numbers or brief phrases In general 
the column of short terms should contain the items from which the choice is made 

5 The items tn Che response column should be arranged in systematic order If the 
hst consists of dates they should be in chronological order For other items 
alphabetical order mil assist the pupils m locating the de-^ired response The re- 
‘>1 onses in the column should then be nunabered consecutive^ 

«l)evi^d by W W n Sone- a d Divid P Ifarry Jr an I pul Ii hod bj I''" W 
Hook C mj n> 



190 THE CONSTRVCTIOK OF TEACHER-MADE TESTS 


6 Indicate clearly the basis upon which matching is to be done lliis should be 
specified both m the directions and m the column headings The pupil ttill be told 
to put the DUMBER of the respon'^e selected m the answer space beside the test 
item 

7 The matching exercise should contain at least fiie and not more than fifteen items 
larger lists waste time and shorter lists increase the possibiht} of gue«!«ing the 
correct response 

8 All the items for the matching exercise should be on a single page Turning the 
page back and forth m search of desired responses is both confusing and time- 
consummg 

G. Rearrangement Tests 

Appendtx C, pages 454-455 contains remarks about the preparation and 
scoring of rearrangement (ranking, sequence, chronolog> , contmuitj ) items 
and should probably be consulted at this point Table 48 on page 455 makes 
sconng them rather ‘=imple, thereby eliminating one of the chief objections 
to this item type 

Part II of the Allport-Vemon-Lindzey “Study of Values”®* consists of 
15 ranking items, the examinee being asked to “^Urange these ans^ ers in 
the order of jour personal preference bj anting, m the appropriate box 
at the nght, a score of 4, 3, 2, or 1 To the statement you prefer most 
give 4, to the statement that is second most attractive 3, and so on ” Two 
of these items are 

2 In your opmion can a man who works in business all the week best ':pend 
Sunday in— 

a trying to educate himself bj reading serious book® 


b Irymg to win at golf, or racing 


gomg to an orchestral concert 


d hearmg a really good sermon 

13 To what extent do the following famous persons interest you 

0 Florence Nightingale 

ft Napoleon 

G„d„e, Lm^Uey and 


□“ □" 0 =^ 


SPECIFIC TYPES OP OBJECTIVE TESTS 


191 


c Ilenrj Ford 


c 



d Gftlilc*o 


d 



blLFCTFD I{rFl.nFSCES Fon Fcrtiier Revdiso 

Adkm«, Dorotb> C, nnd Corulmclton and Anali/sti of Achietmaii Tesls 

^^n5hmKtnn, D C U S Government Pnnting Office, 1947 292 pages 
Ruro«, O'Car K (Lditor), The Fourth Menial Measuremenis Yearbook Highland 
Park, Ncn Jcr^ej Crv-phon Pre^t'*, 1953 1103 pages 
riicl, Robert L , “Wnting tlic Test Hem/’ Chapter 7 m E F Lindquist (Editor), 
Fducalxonnl iUraitirewent \\ a«hington, D C American Council on Education, 
1951. 

FicII, L B, "A Device for Scoring Chronologj Tests," Social Eduealion, 13 
329-331, iVovcmbcr, 1949 

Goodenough, riorencc 1j , Mental Testing New Vork Rinehart A Companj, 1949 
Otaptcr V, "nie Annh«is and Selection of Test Item* ” 

Henrj, N’cI«on B (Editor), "Tlie Mea*urement of Understandmg,' Fortg Fifth 
Yearbook of the Kationat Socidy for the Study of Education, Part I Chicago 
Uni\cr*it\ of Chicago Pre**, 19-10 338 pages 
Jordan, A M , j/casurement in Fducfllion ^e\v York McGraw Hill Book Com- 
panv , 1053 Chapter 3, '‘Con«tructing Achievement Tests ” 

Odell, C , IIoic to Imprcnc Classroom Testing Dubuque, Iowa Wra C Brown 
Companj, 1053 150 pages 

Stcphcn«on, ^^llham, Testing School Children New York Longmans, Green and 
Companj , 10 19 Chapter VI, "Tests of Creative Imagination,” and Chapter 111, 
“Performance Tests ” 

Travers, Robert JI \V ,JIoic to Make Achinement Tests hew York Odjssey Press, 
3050 ISO pages 

TraxH Arthur E . Jacobs, Robert, Selover, Margaret, and Townsend Agatha, 
Introduction to Testing atm the Use of Test Results xn Public Schools New lork 
Harper A Brothers, 1953 Cliapter 4, “How Can Tests Be Selected’ ' 
eitzman, EIIis, and McNamara. Walter J , Constructing Classroom Examinations 
—a Guxae for Teachers Chicago Science Research Associates, 1949 153 pages 



7 


The Construction and Use of Essay 
Examinations 


Stalnakpr’ compare^ the ments of e^saj and objcctiv e tests in a thorough 
dud impartial manner, concluding that both ha\ e considerable \ alue when 
properlj used The summarj to his chapter on page 530 is especialK 
interesting 

The es«a> te<t has been the subject of repeated and often unfair attacks b} 
nsj chobgists and educatiouaUsts mtere«ted m the measurement of achiei ement as 
a science As a result, the e«sa> te«t remains largely unde% eloped, although it 
contmues to be used \ndeh b' the classroom teacher The %alues claimed for it 
ha\e not been generallj established \et it maj well be a ba^ic test form which 
properl} controlled can measure important outcomes ol learning not 3 et otherwise 
measured It also has other potential \alues which ha\e been desenbed It has 
-e\eral important and unique advantages as an educative influence The fact that 
t continues to be a te«t form uudelv used b> the teacher orepanng his own te«t 
would alone «eem to justif} further development and research 

For man} } ears the College Fotrance Examination Board’ has been con- 
'*emed with the improv ement of essay tests, particularly the increasing of 
^coring agreement on Engh«h compositions Its journal, The College Board 
Ra'tew, published three times jearly, and the Annual Report of the Director^ 
contam \ aluable reports of work with easaj tests 

To limit the u=e of informal teacher-made te«ts to tho^e classified as 
objective in type is an unwarranted restriction The so-called traditional 

» John M Stalnaker The Essaj Type of Examination ” Chapter 13 m E F Lmd 
quist (Editor) Educational MeasuremcnL Washington, D C American Council on 
Education I9ol 

* Abbreviated CEEB and located at West 117th Street 'New1ork27 Is Y 

•The 53rd annual report covers the period October 1 19o2-Septeniber 30 
1* 




BSSAY EXAMINATIONS 


193 


lost or C.S.J cxnm.nat,on st.ll has a legilimate place m the modem school 

l!Z of (’ r adtantages and limitations of this 

tj pa of test, and olTcr suggestions for its improvement and use 


A, IJniitntion« of ihc Tssaj KTnmmation 

A*! ordiinnl> employ od, the c-=s'i> examination has certain senous iiraifa- 
tmns It sufTors in companson xnth most forms of objectne tests on the 
three important entenn of a «!atisfactor3 measuring instrument, \ahd2t\ 
rclnhihtj, and usahihlj 

Ixju xnliditx. In the fir^t phcc, the cssi> examination as commonlv 
ii«c<J Ins low \nlidit3 Sc\enf factors contribute to this condition The 
limited sampling of the essaj examination is often pointed out Ruch,< for 
example, prodticGfl exidcnce to shou that the essay called forth Jess than 
h ilf the knoxxicdgc the axerage pupil actually possessed on the subject as 
determined lij objcctixo tests, and rcquirwl Unce the time to do it The 
al'so includes manj irrelcxant factors, such as the quality of the soell- 
ing, Inndwnting and Rnglisli u‘fe<l, as ncll as bluffing, for ubch no cor- 
rection formulas exist It has been suggested that the essay overrates the 
importance of knouang hou to sa> a thing and underrates the importance 
of hav mg something to say In v icvv of thc«e limitations, the ordinary essaj 
examination has little validity as an inslniment of diagnosis 

Ix>v» reliability* In the second place, the essay examination as com 
monly used is lou in reliability Since short tests arc usually less reliable 
♦ban long tests, the narrow sampling afforded by essay examinations would 
tend to restnet its reliability Still more senous is the subjectivity of 
scoring Numerous studies have showm that teachers cannot agree with 
each other as to the values to be allowed examination papers of the Cssay 
type Studies have also shorn? that the same teachers cannot agree with 
themselves on a second senes of values as'=igned independently to the same 
papers Part of this is due to different standard- of marking and different 
weighting of the questions Certain other factors such as the physical and 
mental condition of the person marking the paper*! also tend to condition 
the mark assigned a paper by a given teacher at any particular tune An 
Cnglish poet states the situation as follow*! * 

'Twixt Right and throng tlie Diffeniice is dim 
Tis settleil by the Moderator s Wiiin 
Perchance the Delta on your Paper marked 
Means that his I uiich has disagreed intfi him 


In a study* made at the Univeratv of West Vu-giiiia, Ashbum came to 

<G M Ruch The Objedtm ,r Aev T<pi Fzamtmlim page 54 Chicago Scott 

^"Sri\?TL“LS onii The.r h.Mdute, 

Expenmenlal Ed imiton 7 1 5 September 1938 



194 


THE CONSTRUCTION OF TEICHER-MADIJ TESTS 

the conclusion that "the passing or faihng of about 40 per cent depends, 
not on i\hat they know or do not know, but on who reads the papers and 
that "the passing or failing of about 10 per cent depends on when the 
papers are read " It has been observed that the scores tend to rise as time 
passers, and that the values assigned tend to be greatlj influenced b^ those 
allo\\ed the paper immediately preceding For example, one nriter asserts 
that, "A C paper maj be graded B if it is read after an illiterate theme, 
but if it follows an 4+ paper if such can be found, it seems to be of D 
caliber"^ 

That this situation is not peculiar to American education is indicated b\ 
the Examination Inquiry conducted bv the International Institute of 
Teachers College, Columbia University* In fact, one ivriter® asserts that 
evidence showed the unrehabdit> of essaj examinations in Europe was 
"even more serious" than had been revealed many times m Amenca 
In support of this rather surpnsing conclusion, he says "In the English 
studies, examiners w ere found to reverse their judgments almost completely 
when asked to mark the same papers they had scored a year before " 

Bowles’^® comments concerning England’s examination system for col- 
lege entrance are illuminating He concludes that it is quite deficient when 
judged by American standards of reliabihti and statistical validity, but 
that because of vanous safeguards for the individual "the sjstem works" 
well 

In fairness to the essay examination, however, it should be pointed out 
that many of the studies reported havebeen with unimproved forms of the 
examination given under unfavorable conditions Often the essaj exami- 
nation at its worst has been compared with an improved objective test 
Under such conditions, the former is bound to show up m an unfavorable 
light If objective tests had been scored under similar conditions, without 
scoring rules or ke> s, the agreement of the scores w ould be less impressiv e 
As a matter of fact, even with scoring rules and keys, the agreement among 
the scores on objective tests allowed by amateur scorers is far from perfect 
Under favorable conditions the agreement among scorers of essay exami- 
nations approximates that reported for objective tests One study’^ reports 
that the average correlation coefficient between first and second scorings 
of an essay test in history by three experienced scorers w as 98 Another 

’’ John M Stalnaker ‘ The Problem of the English Examination ’ Fducaiional Ihcord 
17 41 Supplement ^o 10 October 1936 

• Published by the Bureau of Publications 1936 

*W Carson Rjan Jr The Seventh World Conference of the New Education 
Fellowship 11 Sihool and Society 44 364 September 19 1936 

Frank 11 Bowles The College Entrance Examination Board Slst Annual Report of 
the Director 19 I pages 23-30 College Entrance Examination Board 425 West 117th 
Street New York 27 New York 19o2 

" Ro> E Cochran and Charles C Weidemann Improvement of Consistency of 
Scoring the I xplam and Discuss Es«a> Exammation a paper read before Section C 
of the XiT'O'- cm Fducational Research Association at Cleveland March 1 1939 



ESSAY EXAMINATlOm 


193 


sUidj reports that the median cocfScicnt obtained between two mdepend 
rat reading, of certain Cotlege Entrance Board examinations nas 97 lUl 
liuntj of the coefficients „erc abore 90, Math the exception of Fn -lish, 
Minch Mas SI It must be kept in mind, hoiievcr, that these examinations 
Mere so Morded as to make the sconng more objective than is usually pos 
611)10 wlih orainar> cs^aj cxammations 


It sliould be noted that most studies ha^^ng to do mth the reliability 
of examinations really show the reliability of maritnff f/ie examinahon 
nthertlmri the reliability of the cxaminafton itself A feu studies haxe been 
repor(o<i of the correlation between two forms of an essay examination 
ile'^icncd for n particular purpose which xxcre given to the same pupils and 
carefulh marked by experienced examiners McGregor and Ruch” used 
tins procccluro m studying eighth grade examinations in sixteen subjects 
from 952 pupils in elev en states Each paper m the tu o sets of examinations 
as marked independent)} by two expenenced teachers This study made 
it po‘fsih!c to compare the relmbihty of the examtnohon with the reliability 
of rnar/tnt; fhc cj:amtna(wn The agreement of the two independent mark- 
ings of the «amc papers is represented by an average corrclatiou of 62 
while the agreement of the two sets of examinations marked by the same 
teacher is rcprc«cntod by an average correlation of only 43 One of Ruch s 
students, Dr W E Gordon ** made a sunilar study of the New York 
Regents' rxammatjons with startlingly comparable re'^ults He found the 
avenge agreement of the two independent markings of the same papers 
wim 72, while tlic avenge agreement of the two sets of examinations 
marked liy the samo tcaclicr was only 42 Another study” conducted at 
the University of Chicago High bchool showed that two independent «eta 
of marks assigned by two ‘experienced reailers of essay examinations 
agreed to the extent of 944 on Eorm A and 845 on Form B but that the 

correlation between romi A and Form B was only 60 These three studies 
seem to agree on one important point The rehahxUly of marking the essay 
cxamtnaUon is higher than the rehalnlity of the examination itself 

usability 'IJieessay examination also ranks low in usability There 
seems no escape from the fact that this type of examination is time 
consuming, both for the pupil and for the teacher In fact, the additional 
expenditure of lime and energy over that needed for objective te«ts is so 
serious a limitation that the use of essay examinations can be justified oidy 
If It can be shown that the values realized are commensurate with this 


invistment 


■■Johe M btalnau,! r,sjy £x»min»tioM Ifetably Rod -Senrfy 

96 Ch.«o Scot! 

Toresm tti A. Company 1929 

:;i“.,rT^r?,,:r«od Harold A And.™ The Rel-bddy „f F.w 

1, „ r 1,1 h VI /to.™ 6J4 SI I leito-ber J9.i5 



196 


THE CONSTRUCTION OF TEACHER-MADE TESTS 


B. Ad>antages of the Essaj Examination 

Reliabilitj and usabilitj. E^en the most enthusiastic ad\ocate of 
esaaj examinations would scarcely claim their supenonty oxer objectixc 
tests on the grounds of reliability or usability The best that can be hoped 
for essaj examinations is that bj the use of improxed techniques their 
reliability maj approach that of objective tests As regards usabilitj, the 
fact that the questions can be written on the blackboard is an advantage 
onlj in tho«e schools which lack dimlicating facihties The reduction in 
tune reqmred to prepare essav examinations is more apparent than real, 
if the work is v ell done Whatever advantage arises therefrom is more than 
offset bj such considerations as the extra time demanded for gmng and 
scoring 

Vabditj. It is apparent tliat if the use of essaj examinations can be 
justified it must be upon the ground of their superior vahditj for certain 
purposes ^Tiat, then, are the unique functions of these examinations’ 
Unfortunatelj , upon, tlua crucial issue little experimental ev idcncc exists 
One studj indicated that about 30 to -10 per cent of the mental functions 
measured bv improv ed es^aj tests of the “compare and contrast” were 
not measured by true-false tests covering the same material Two similar 
studies bj Cochran and l^eidemann'^ compared one-word fact tests and 
essay tests of the improved “explain” and “discu's” ijises covering the 
same material, and concluded that about 40 per cent of the mental func- 
tions measured bv the latter were not measured bj the former The im- 
portant question of just what umquc mental functions each type of test 
measures remains to be answered 

In the absence of experimental evidence, it is necessarj to fall back on 
logical considerations The es«ay test appears to be useful for measuring 
four objectives of instruction functional information^ certain aspects of 
thinkmg, studj skills and work habits, and a functioning social philosophj 
It will be noted that these objectives emphasize the functioning rather 
than the mere posse‘=sion, of knowledge 

There would appear to be little justification for using essaj tests for the 
recall of knowledge m piecemeal fashion Sims,” how ev er, anal j zed 458 
tluesUons ordinanlj classified as of the essaj tjme, and found that fewer 
than half m the high school and fewer than one m five m the elementarj 
school involved discussion, the others being almost equallj divnded be- 
tween simple-recall and short answer questions requinng not more than 

“C C Wpidemann and L>Tjdall Fisher Jdewens “Does the ‘Compare-and Contrast 
Espftv Test Measure the Same Mental Functions as the True-False Test? ’ Journal of 
Ptychology, 9 430-149 October, 1933 

” Itoy R Cochran and Charles C Weidemann, “ ‘Explain’ Essaj Word Answer 
Fas*. T^st Vhx tidla Kapjmn 17 o9-61 7o December,1934,and“AStudyofSDe- 
*iaJ Tyree of Testa ' Phi Della happan 19 113-115 131 Januarj, 1937 

• Vemer Martm Suns ‘ Essaj Examination Questions Classified on the Basis of 
CbjecUvity,’' SAool and Society 35 100 02, January 16, 1Q22 



197 


ESSAY L\AMJNA7'!0i^S 


nnc sentence for o response The &eluet.on Committee of ‘he =ea.tl 
Sc onls flmc to the conrlusioir that the cc alintion of groinh lu la iguage 
nbilit} uould require the useof Aeicral Opes of tests 

fi necesMr^ to meaMire achievrm. i „t 

tl r>n 1 ^ Objwtne t<Nfsof the multiple-choico and nmUhmu 

j\l>c could be u«c<l to measure ftrhicxement at the recogmlion and rtcall ItirN 
IIo;NCNcr, c\aluatinRachic\cmentat thclcxclof tn/^pretotton and eialualion ^^< uld 
rtKiuire c^<=a\.t\i^ tc^t- as well as certain kinds of objectne tests evaluatins: 
ncJiieiemcnt at the Jcxel of appheahon would «cein to be done mo^t effectnelj In 
i>p\ te^ts mnee this would jmohr movunng the students abiht} to utilize 
mfonnaljon Ienme*l in one situation m the solution of problems in a new settmg 


One other arhantage of the essaj examination should bo mentioned 
Sc\cral cxperinientn! studies luxe shown that the tjpe of meu'^urement 
u^cd 1)3 tlic teacher influences tho t3pe of stud3 procedures employed by 
the pupils Wlicn pupils expect the test to he of the essay type, ui whole 
or in part, thex seem more hkly to employ such desirable study techniques 
as making outlines and summaries, and seeking to perceive relationships 
nnd trends, than is done when objective tests are used exclusively 
The practical conclusion follows that neither the essay nor objective test 
should bo used exclusively From LecondSegeJV* analysis of the measure- 
ment practices of l.OOtl 'secondary school teachers, distributed widely over 
llio United States, and from the Hensley -Davis study® it appears that 
teacher? fav or tlic use of a combination of the tw o types It is encouraging 
that tlio practice of more and more teachers seems to be governed by the 
sound philosopliy of measurement stated bv Lindqui-^t in the follovnng 
sentence '* 


The intclligciit poiiii ol \tcn is tint irhith recotnizci ihat whatever advantage'- 
either tyTic mav have are specific advantages in specific situations that while 
certain purposes may be best bervc<l bv one type other purposes are best serve I 
by the other, and above all that the adequacy of cither type la any speeiJic 
situation IS niHch more dependent upon the Jiigenuity and intelligence with which 
the test is used than upon any vJterem charactensti'' oi limitation of the type 
employ ed 


C. Suggestions for Improving E'^say ExaraiiiatioDs 

Alfhough the essay evaimnalion has been jn emstente for hundreds of 
3 ears, the amount of research devoted to it is much less than that devoted 


■■ Helen F Obon Evdusl.ng GrovlJ. in LanSIWSe ‘'bl'ilV Joumiil c/ BducaKmal 
UnearcA lO 2J7 Dciember lOto « , 

Uniled States OfBre of I dncatioi. Teachers Think and 

Do About ThurFsammiums Admm,lrat,m a«d Supem„on 38 219 

\ch “vement I xa nil adona pig -U ZWMiwn 



198 THE CONSTRUCTION OH TL WUER-VADE TESTS 

to the objective test which is comparatively new Turthermore, practically 
all the research relating to the former has been of a negative kind Its pur- 
pose has been to show' how poor unimproved essay examinations are, rather 
than to devise means for their betterment However, a stud} of the meager 
expenmental literature does }acld several positive suggestions The next 
tw o sections wall be dev oted to a consideration of some of the most promis- 
mg of these suggestions 

ImproMng the construclion and use of cssa> cvaniinations. It is 
just as important to know where to use the essay evammation as it is to 
know how to use it It is wnse to restrict the use of the essa} test to the 
measurement of those functions for which it is best adapted There would 
usually appear to be no good reason for emploj mg subjectn e measurement 
where ob]ecti\e measurement is adequate What, then, does the essay 
examination attempt to do^ 

Weidemann" recognizes eleven definable types of improved essaj ex- 
aminations Arranged in a senes from simple to complex, these tj pes are 
as follows (1) what, who, when, uhtch, and where, (2) bsl, (3) oulltne, 
(4) desaibe, (5) contrast, (6) compare, (7) explain (8) discuss, (9) deielop, 
(10) eummarize, and (11) etaluate The first two types seem hardly distin 
guishable from recall tests of the objective tjpe Many jears ago Monroe 
and Carter^ made a very suggestive classification of thought questions 
mto twenty types These tv pes, together with an illustration of each, taken 
rom the field of measurement, appear below 

Thought Questions 

1 Selective recall — basis given 

^ame three important developments m measurement which occurred during 
the first decade of the twentieth ccntur> 

2 Ev aluatmg recall — basis given 

Is ame the three persons who hav e had the greatest influence on the dev elop- 
ment of intelhgence testing 

3 Companson of two thmgs — on a single designated basis 

Compare essay examinations and objective tests from the standpoint of 
their effect upon the stud> procedures used by the learner 

4 Companson of two things — general 
Compare standardized and non standardized tests 

5 Decision — for or agamst 

In which in >our opinion, can jou do better, oral or written examinations? 
^nv^ 


*‘C C Weidemann \\ritten Examination Procedures J hi Della Kaomn JO 
7^83 October 1933 also C C Wiedemann Review of Essaj Test Studies 
of Higher Educatum 12 41-44 Janoary 1941 

« Walter S Morvoc and lUilph E Carter The Uie of Different Type, of Thouqht 
Queelion, m Seeo^ory School, and Thar Belabie Difficulty foe Student, 26 pages 
Urbana mmols Bureau of Educational Research Bnllelin Number 14 Bmvers^ of 



css ir cxaminations 


199 


0 CiU'c or ofFects 

" ultl™ ‘-ts dun^g the 

‘ p "" ”'■ ™'‘ “f Of statement m a ' 

H Ini n the meininB of "Della" „ fte ,erec quoted on uage 193? 
h Summin of tome unit of the teat or of some art.clo read 


9 An »hvi« (Tlie word :t'clf is seldom in\o1\cd m the fjucstjon> 

“progn-^nc cducatora” suspicious of Btandardized 


10 St'itement of relit jon«!)j(>s 

js It that neirJv aJJ essaj c^aminsUons, nega/rffess of the school subject 
tend to lie mevurrs of tlie leinier’s masteij of English? ’ 

11 IIlu«tntJons or examples (jour own) of principles in science, construction 
in hnguigc, etc 

Cue two original ctimples of epccific dcfermincrs in objectne tests 

12 Classification (unialli the con\cr*c of No 111 

^\^nt t\i)c of error appears m tins test item? “W ith ivhat Balkan countia 
did the Allies fight in \\orId ^\ar I?" 

13 Application of rules or pnnciplcs to new situations 

In the light of China’s experience with state examinations what would >ou 
expect to lie the effect of the Regents’ Examinations in New York? 

14 Djscu'sjon 

Discu-s the phee of measurement in science 

15 Statement of aim— author’s purpexe in his selection or organisation of 
roatena! 

In new of the author’s discussion on pages 19 and 20, wh} are ’fo many 
authorities quoted in Chapter 17 

IG Criticism — is to the adequacy, correctness, or releianc^ of a printed state- 
ment, or a classmate’s answer to a question on the lesson 
Cnticize or defend the statement “The essa^ examination oierratea the 
importance of knowing how to say a thing and underrates the nnportance 
of haling something to sai ” 


17. Outline 

Outline the principal steps in the construction of an inlarmal teacher-made 
test 

18 Reorgamzition of facts (a good ti-pe of renew question to give training in 

organization) ^ ^ . 

Name ten practical suggestions from Chapters 4, 5, and 6 that are particu 
larli applicable to the subject jou teach or plan to teach 

19 Formulation of new questions— problems and questions raised 

^Tiat are some problems relating to the use of essay examinations that 
require further’studj? 

20 New methods of procedure , 

Suggest a plan for pros mg the troth or fabitj of the contention that eremp- 
tion from examinations is a good po!ic\ m hicdi school 



200 THE CO\^STRUCTIO\ OF TEACHER-MADE TESTS 

Special ad^antaRcs. It -nill be noted that the classifications bj Weidc- 
mann and by Monroe and Carter rctopnizc a cnnsiderablc number of rather 
di^linct abilities, which arc measurable bj essa\ tests It is probablj best 
to measure each one separately rather than to attempt to measure several 
ol them by the same test It wall be further noted that the emphasis in most 
of these tvTies is upon organization, relationship, evaluation, application 
or some similar ability to which a purely objective test may be poorlv 
aclaptpfl 'teachers should study each tyiie of essay question carefully until 
they are familiar with its distinguishing charactenstics If a proposed essav 
qurstion does not seem to conform to one of these types, it had usually 
hotter be rewordeil or adapted to some form of the objective test Noques- 
tion should be included until its purpose has been clearly defined 

rhe es ay examination would appear to be particularly valuable in two 
situations The first of these is obvaously m such courses as English com- 
position and journalism, where the student’s ability to express himself 
eilectuely is the major objective of instruction The second situation is in 
advanced course* of other subjects, where cntical evaluation and the abil- 
ity to assimilate and organize large amounts of material constitute unpor- 
taut objectives In this connection it is sigmficanl to note that Jones®* 
found that 68 per cent of the college students who took senior comprehen- 
«iv e examinations and 55 per cent of the supenor students m other colleges 
btated their vnews as follows “I think one’s ability is far better shown 
through discussion questions than through short objective questions " 
There is some evidence that a more valid sampling of the pupil’s knowl- 
edge IS afforded by increasing the number of questions and reducing the 
length of discussion expected on each In many cases a well-constructed 
paragraph is sufficient Very few discussions need exceed one or two pages 
in length In any case, the question should be so worded as to restrict the 
responses toward the objective whichit isde ired to measure For example 
V nghtstone suggests that the question, “Rxplain the reasons for the strike 
at General Motors m 1937,” is too general, and would be improved if it 
were restneted by the addition of the phrases “to shov (a) the labor gnev- 
ances of the employees, (b) the practices of the employer fc) related na 
tional, social, and economic factors, (d) the rival labor umons and (e) the 
method of stnkmg’ ” It must be recognized however, that such sugges 
tions at least m part, take from the essav examination its uniqueness 
The proposed modifications may appear to improve the rehabihtv of the 
traditional exammation by the obvnou* devnee of making it more like the 
objective test 


‘‘^wardSafTord Jone« Comprehensue Examinations in Amencan Colleges mce 373 
Newiork The Macmillan Company 1933 
«J VV tv nghtstone Are Essay Exanunahnns Obsolete’ Soaal Eduailwn 1 40J 
September 1937 



rss I r / \ 1 TioNs 201 

One of tlic (hllicuUics \Mtli constructing cssnj tests is that the process 
nppenrs so eej \s a matter of fact ./ n probahlj more d.f.clt to construct 
cs’ni/ tests ofhtgh qnahtj than it is to construct objccttie tests of high quahtj 
ludi circ and thought must be given to their constniction if tests of an> 
kind arc to mca'^urc anj thing but mere memory for factual knowledge 
^r•ln^ of (lie genoni principles of testing outlined m an earlier chapter arc 
IS applicahlc (0 (p^(s ns to objective tests There is ahvajs nsk that 

in attempting to plirn«e c‘=‘=n> questions so (hat thc> can bo scored more 
object i\cK, (lie result ma> lie made Ic^s satisfactory than an out-and out 
objective test Cntirnl rcviMon utilizing if possible the judgment of a 
coDcague is 0‘-pecnlh important 

l*r« paring sttuUnt*! to take e«sn> Some vvnters have empha 

sized tlic importance of training pupils in taking examinations orcester*^ 
suggests tint the c*=‘5a\ examination is 'obvnously invalid and unfair 
it least in part bceau'^c the pupils are being rcquirwl to take a test on a 
tjpe of work for wliicli thc> have had no specific training Tlie rational 
solution offered is to supplj the ncccs^arj training rather than to abandon 
the c«saj examination \\ idcr experience and training m preparing for and 
in taking tests of all t> pes is Iikcl> to increase the accuracy of measurement 
Edmiston** prepared instructions to pupils for taking examinations wbch 
were far more elaborate than the usual directions accompanying tests 
ife found that the use of these instructions increased the validity of the 
examinations and produced definite improvements m studentb records 
of achievement from examinations * It would appear wase to provide in 
Blniction of this sort in tlic regular program of studios It is most unfortu 
natc when a pupil fails to receive recognition for knowledge ho actually 
po^sos®cs simplj because he has not mastered the technique of putting it 
on paper Edmiston s suggestions « given below will prove helpful in plan 
ning such a program of instmction 

IStrOIlTAXT C OVSIDEKATIOXS IN TAKING EXAMINATIONS 

1 lour name fhould appear on the firet or last sheet of the exauination if 
Bheets arc secureh bound Each loose sheet should have the name entered 
mconspicuou Ij preferal Ij on back tcAere ti mU not be sem bj the scour 
irAen scoring 

2 flnlcIcBibls ^ our an^iMir cant be nght if It can thread Be sure 

jour pen or pencil (if allonedl fosters dtslinct and not bl mt i nntinj, 
n Use terms or a locabulirj smled to the subject Do not usr nonl unless 
® Its meamae is dear to jou and repeat a nrord rather thi . use imoll or „l„di 
maj not hare exactly the same desired meanmi, 

■•DA norccter On the Vaf dity of Testing Sch, I ttmni tj 6.7 611 

k'' w“4n.sfon Examine the Fxam nation Jmimo oj Fd , rational I ,„rh, m 
30 126-138 February 1919 
*" Ibtd pages 137 13a 



202 THE COXSTRUCTIO.V OF TEACHER-MADE TESTS 

4. Space (the back of sheets, the maiEins. or an e.atra sheet! should be used tor 

o. computations. u j l ^ r 

6. practice in the formation of desirable statements, not padded but lur- 
nishing quality rather than quantity to the answer, 
c. the hasty jotting of tacts pertaining to some questions when these facts 
arise, while working upon another question. 

5. The statement of each question must be fully considered. Carelessness not 
only penalizes the student but also lowers the dependability of the measure- 
ment obtained by the instructor. 

6. The directions trlling how to answer the questions should be carefully 
followed. Underscore the important noints in the directions. 

7. In essay questions, underscore the part of the statement that furnishes the 
direct question asked. Then underscore any parts of the statement which 
furnish data for the answer. Number each part so that you will not omit 
anything from your answer. 

8. Proceed directly through the e.xaminalion with no lengthy consideration of 
unfamiliar points. After completing the parts which were readily answered, 
start again and answer those questions which jield to more diligent effort. 
Do not waste time by trial and error method upon questions which bring 
no recognition or recall of related materials. After completing the second 
consideration of the test, spend the remainder of the time upon the more 
familiar of the unanswered questions. Note that hesitation wastes time, 
ruins confidence, and destroys mind-set. 

9. If after thorough consideration you do not understand some direction or 
question due to other than lack of knowledge of the course, call the attention 
of the person in charge with as little disturbance as possible in order that 
the tester may come to your seat or allow* you to come to him as conditions 
may determine. 

10. Reread each answer before passing to the next question and the completed 
examination before delivery to the instructor. Is the meaning clear and 
writing legible? 

By way of summarj*, three important suggestions for the construction 
and use of essay examinations are as follows; 


1. Restrict the use of the essay examination to those functions to which 
it is best adapted. When it is not clear that the essay tj’pe is required for 
measuring the desired objective, use the objective test. 

2. Increase the number of questions asked and reduce the amount of 
discussion required on each. Always indicate clearly the tj'pe of discussion 
desired. 

3. Make definite prov'isions for teaching pupils how* to take examina- 
tions. Specific training in preparing for and in taking tests and examina- 
tions of the various types commonly encountered is a legitimate objective 
of instruction. 


Imprn>ing the scoring or grading of essay examinations. Kins- 
laiiJ^* ma kes a distinction between the terms scoring and grading. Scoring 

o 1* Rinsland, Conatrueting Tests and Grading in Elementary and High 

behoof SubjecU, page 302. New York: Prentice-Hall. Inr., 1938. 



ESSAY EXAMliXATlONS 


203 


13 an objective process of counting nght or wrong responses, whereas grad 
ing nlw.ivs means interpreting qimlit} in terms of some criterion StnctK 
siwaking, then, it is more correct to speak of grading or rating essay ex- 
aminations than it is to speak of sconng them 

It IS, of course, apparent that whatever claims are made for the validitv 
of tlie c«‘5a> test as a mcasunng inslniiiieiit arc conditioned upon the as- 
sumption tint the papers can be read accurately Not only must the essaj 
t(vt, for example, call Jorlh from supcnoi pupils responses which are con 
‘■icicnllv supenor, !)ut the teachers marking the papers must be able 
coiiMstcnll.v to that the} arc superior responses The same is true 

of rcspoincs walh other degrees of merit The grading of the essay exami- 
nation, therefore, occupies a strategic position 

To l>ccin with, certain preventive measures are important A carelul 
wording of the questions nml directions to the pupd nhieh indicate clearlj 
nisi nhst ti pc of response is es|iicted mil simplify the problem of marking 

1 0 papers The n-o of opt, anal ,i.,s<.<.ns ohoM he d,scouragcd ■■The simple 
nrccLdon of hating the pupil record his name inconspicuously either on 

tlie hack or at the end of the paper, t>'“" 

IS hkelv to increase the accuracy mth nhich the paper is graded 

Cochran and Wcidcraann^* outline a procedure for evaluating ess®y ev 
t Atic the 0"Ciitia!s of winch can be taught in ten minutes lbs is 
nminat ons, maioritj of tho consistency coefBcients of two 

shown by- tho fact dial tlmmat^rj^^^^^ 

senes of scorings ma training Independent scores by experienced 

teachers rage agreement of 9S when the procedure given 

Sow m'a slightly modiHed and abridged form was used 

, I read over a sampling of the papem to obUin a general idea of the grade of 

answer I mav ^ I consider another question 

2 I score one question all of ^ 

I have found two outstandi^ comparison of answers appears to 

entire set of paptr^ i^ejire j^ggp 

make the grades more exa promotes accuracy 

one list of points in ^ the material in the text which covers the 

? Before sconng anj P'JP® , ootes on the subject 

questions and also the leciu discussed m every answer 

4 I make a list of the P yji,ed and assigned a certain value if the scoring 
Each of these points assigned to the mam points needed for a 

IS to upjiroach .g designated as the minimum score If a pupil 

reasonably adequate a required jet pertinent to the question his 

el iboratc*' and discuses p gcoj-g maj 

answer la given an ado exceed a certain set maximum 

^a^3 for different pupils out / 

— , , . Essav Type of Examination op nl p i?op '>05-506 

»JohnM VVeideSm op ctl 

« Uoj I Cochran and v,' 



2D4 THE CONSTRUCTIOX OF TEACHER-MADE TESTS 

5. Af ter the points have been weighed, the actual scoring bepns. I read the an5wer 
' through once and then check back oyer it for fact details. I attempt to mark 
everj* historical mistake on the paper and write in briefly the correction. As I 
read* the answer I make a mental note of the points omitted and the value of 
each point, so that when the end of the question is reached, I have tlie minimum 
grade figured. If there is any addithiiLtl or c.xtra percentage to ^ given, it is 
added to the minimum s<*ort, ami then the value of the question is written 
in terms of the per cent deducted rather than the positive per cent. Then when 
every question on a pajvr L-* scored, it is a simple matter to add the negative 
quantities and obtain ths final gnule. 

It is difficult to o^-ereraphasize the importance of three things; (I) the 
preparing in advance of a list of answers which are considered adequate 
for the objectives of the test; (2) the assigning of a specific value to each 
essential part of the ans\vcrs; and (3) the grading of one question tlirough 
all the papers l>efore going on to another question. Jlost students of the 
V problem recommended attempting to distinguish a relative^' small number 
P of degrees of merit in an answer. Perliaps as good a plan as any is to allow 
credit for each part of the auswer considered essential to a question as fol- 
lows; 3 for superior, 2 for average, 1 for inferior, and 0 for an omission or 
wrong reply. Stalnakor'* found that the weighting of essay questions was 
of negligible value — the correlation between weighted and unweighted 
scores on the College Entrance Board E.'taminalions varying from .97 
to .997. 

Grading by sorting. In addition to the pi'iuts made by Cochran and 
Weidemann, several authorities have found anotlier suggestion helpful. 
The suggestion is to make a sorting of the papers into t^ee to five piles, 
according to the merit of the discussion of each riuestion on the basis of a 
brief preli min a r y examination of the answers. Sims describes one such pro- 
cedure a.s follows:*® 

1. Quickly read through tht papers and on the basis of your opinion of tlieir 
wo^ sort them into five groufts as follows; (a) very sriperior papers, (bl su- 
perior papers, (c^ average papers, (d) inferioi papers, (e) very inferior papers. 

2. Reread the papers in each group and shift any that you feel have been misplaced. 

Flanagan*® has shown that the optimal percentages for five groups are 
9, 20, •12, 20, and 9. Therefore, about 10 per cent of the papers might be 
called ver>’ superior” and 10 per cent “very inferior.” Twenty per cent 


V ^1- Stala^er, "Wdghting Questions in the Kssay-Pj-pe Examination,** Jour- 
nal of Educalional Peychology, 29: 4S1-J90, October, 193S 
«\ereer Sims, ObjecUvity, Reiiabffity. and Validity of an Essav 

Rating,” Journal of Educational Research, 24: 2I&-223, Oc- 

ti^n rStP- Eff^veaess of Short .Method; for Calculating Correla- 

tion Caeffiaente, Prjehologieal BulUtin, 49- ai2-S4S, July, 1952. 



examinations 205 

nould lie “^tipcnor” an, I a like percentage nouW be "infenor” The re- 
maining 10 per cent are 'arempe" Hic-t are rough approinmations of 

‘'"= “■“ ^‘'^denrX 

The prclimiino* sorting of tlie paptr^ juto pil£?s of approumately eoual 
merit {>cforc assigning uumcncal ^altlo‘. to them will help to a%oid the 
(lifncultj pointiHl out bj StMnaUr inmelj, that the values alloi^ed a 
paper arc often greatl> mnuenced bi tlic merit of the paper which happens 
imme(iiatoI\ to precede it in the order of scoring It is also easier to locate 
papers tli‘*tnictlj out of line with those m a particular group supposedly 
of bimilar qualit\ It is a good idea to throw the papers into a single group 
after each question fus I>cen evaluated and before they are re-sorted into 
piles ncconhng to the merits of the dtscussions of the nert question This 
procedure w il! make it easier to conceal the identity of the particular pupil 
whose paper is b«.ing judged and so to avoid one of the most disturbing 
factors m marking c^^aj examinations 
Tlie echool should adopt a policy regarding what factors shall be con- 
*-idcrcd, and wliat factors shall not be considered, in oaluatmg a WTitten 
examination Onhj (ftosc factors should be ialcn into account which afford 
ntdcncc of the degree to which the pupUhas attamed the objecltics set up for 
that parlicular courw Except m English clas^o* this wall rule out making 
urbitrarj reductions for such things as faulty ’sentence structure, para 
graphing, liandwnting and the spelling of nontechnical words These fac- 
tors wall be con-'idcr^ only m ®o fur as they affect the clarity of the 
pupil's discussion It is always legitimate to hold the pupil responsible 
for the spelling, as well as the meaning of the cocabulary which is peculiar 
to the course 

This docs not mean that the quality of the written English used in ex 
aminations is unimportant and should therefore be disregarded On the 
contrary , it is alw ay s ^ cry important But it should be considered only in 
relation to that for which it may be accepted as valid evidence namely, 
in determination of the pupil’s mark in English ’Where the teacher has 
complete charge of an entire grade, tins adjustment is easy to make But 
where the school is departmentalized the problem is more difficult Even 
here It should be possible to work out a system whereby at intervals the 
papers in other subjects, after having been graded as to content, may be 
turned over to the English teacher to be judged from the xaewyomt of their 
ments as English compositions In tins way it may be possible to sample 
the pupil’s eharaetcnstic perfommnee iii wntten English better than when 
he w rites a paper spec.ficalb for the English leaclier And what is equally 

important ,t makes the pupil smarkinoUierMibjectsa measnreofaohie^ 

nieiit in tjiose sul.jerts rattier than partly a me, sure of skill in English 
foinp i-hition 



206 THE CONSTRUCTION OF TEACHER-MADE TESTS 

Selected References for Further Reading 
CoWe< 7 e Roard ffatcu?, “Reading Conference,” No 19 324-326, Febmarj, 1953 
Cook, Walter W , “The Functions of Measurement in the Facilitation of Learning," 
Chapter 1 m E F Lmdquist (Editor), Educational Measurement Washington, 
D C Amencan Council on Education, 1951 
Elhs, Albert, “An Experiment in the Rating of Essa> -Tj pe Examination Questions 
bj College Students,” FducaltoTtal and Psychological Measurement, 10 707-711, 
mter, 1950 

Flanagan, John C , "The U^e of Oimprehensne Rationales m Test Dcxelopment,” 
Educational and Psychological Measurement, 11 151-155, Spring 1951 
Heurj, Iselson B (Editor), "The Measurement of Understanding," Forty-Fifth 
Yearbook of the National Soaety for the Study of Education, Part I Chicago 
Umx ersitj of Chicago Pre«s, KMC 338 pages 
Kostick, Max M , and Nixon, Belle M , “How to Improx e Oral Questiomng,” 
PeahMy Journal of Education, 30 209-217, Januarj , 1953 
Odell, C VI , How to Improve Classroom Testing Dubuque, Iowa Wm C Brown 
Companj , 1953 Chapters \ and “Discussion of Es^aj Examinations” and 
‘ Short-J^swer Tests General ” 

Reisers, H H, and Gage, N L, Edt,eaUonal Measurement and Eialuatum 
New York Harper A. B’others, ltM3 Chapter XII, “Es'aj Testing " 

Stalnaker, John M , “The Essa> T^Tie of Examination,” Chapter 13 m E F Lind- 
quist Educational Measurement W ashmgton, D C Amencan Council on Edu- 
cation, 1051 

Torger«on, Warren S , aud Green, Bert F , Jr , “The Factor Analysis of Subject- 
Matter Experts,” Journal of Educational Psydiology, 43 354-363, October, 1952 



PART III 

The Testing Program 



8 


Slops ill the Tesliitg Program 


Dcir IVofc^'or 

I hi\c (londctl to some tests iii seboo] Ibis /alJ PJease suggest s feir 
goo(i OHO** I might trj AI«o let me knon where to get them and tvhat they mil 


De tr Profe^of 

^^0 the L/{^(<^Date General Acbieiement Te«ts at the beginning of the 
‘■rbooJ ^rar As we non hue most of them «core<l please adnse me to use 
the re«ult9 so ns to get the most good out of them An} help mil be greatly 
noprcciate<l 


Probabl} c\cr} college professor who offers courses m measurement has 
rccci\od letters like tho«:e abo\e They indicate that some school is under 
taking, or has already undertaken, to use standard tests without under 
‘rfanJrng^ wfiat rf re aff about A/ways, fesfmg shoa/rf have a program to 
guide it ‘ \\niat, then, is a “testing program”’ 

General considerations. The wonj “program has certain important 
implications, such as order, system, planning It implies a sequence of events 
that has been determined upon after careful thought, rather than some 
haphazard, hit-or-miss affair One of the chief weaknesses of many at- 
tempts to use standard tests is that there has been no program worthy 
of the name The whole procedure has simply led a precanous hand to- 
mouth existence from begvnmng to end 
Spence* has suggested that “a good testing program should be supple- 
mentary not duplicative, usable not confusing, economical not burden 


‘ Julian C Stanley ' Standardized Teats and Educational Objectwee Peabody Jour 
nat of Education 28 218-221 January 19oI out 

* Ralph B Spence A Comprehensive Testing Program for Elementary bchools 
T tachers College Record 34 27&-284 January 1933 
209 



210 


THE TESTING PROGRAM 


some, comprehensive not sporadic, suggestive not dogmatic, progressive 
not static ” Such a program, at least m tentativ e form, may v cry u ell cov er 
an extended penod, rather than be adopted piecemeal 5 car by year One 
advantage of this long-range planning is that it makes possible a vaned 
program without leaving gaps or involving needless duplication Stcnquist,* 
speaking from wide experience, strongly advocates “some sort of system- 
atically recurnng schedules as opposed to sporadic testing,” since schedules 
make possible “enormously greater gams” from testing Spence offers for 
elementary schools w hat he calls “a conservative approach to the problem ” 
This program is given in Table 26 

TABLL 26 


Plan for a Tlstino Phooram for the Elemestary School (aftfr Sience) 


Grade 

Ivtlllioevce* 

Achievement Battert 
Asnoal 
(Gives is 

Mae or Apr 

Achievemlst Tests for 
Special Emphasis 

All Grvdes prom 3 ok 

4 TO 8 RoTvnso 
(Gives is October)^ 

Kdg-I 

II 

lU 

Two Group Tc«ts 

Reading Battery 

Skill Subjects Battery 

^ear— Reiding 

IV 

One Group Test 

Complete Battery 


V 

VI 


Complete Battery 


VIII 


Complete Battery 

U«age and Spelling 

Fifth if ear — Reading, etc 


• Retests for special ca«e3 as needed preferably with an individual teat 
All dates ba^ed on groups beginning a grade m September Teachers u^e diagnostic 
tests throughout the year 


It mil be noted that this program calls for the use of both intelligence 
tests and achievement tests, and for the use of test battenes as well as of 
tests in the separate subjects It is also expected that the program mil 
merelj, supplement rather than supplant the ordinary informal tests and 
examinations made bj the classroom teacher A slight modification of the 
schedule as presented would involve gmng a general test battery in all 
subjects about every third year, and an intensive program limited to one 
subject in each of the intervening years The cost of such a program of 
standard tests would De less than twenty-live cents per pupil per year 
If the tests are intelligently used, it is doubtful whether greater returns 
can be had by the school from the same amount of money spent in any 
other way 

‘JohnL Stcnquist ' Recent Developmente m the Uses of Tests,' nmm of Fduca 
tional Research 3 GO February , 1933 









211 


Slip’; IN TIIL TESmSG PROaitAM 


ul ? M d'^cussion of the planning and administra 

t on or tlic testing program, dnidcs (eats into tno broad categories The 
rst inchidcs group tests of intelligence and achiei cment tests in the major 
SII ijcct matter areas rhe«c should he administered at regular intervals to 
ei erj iionnal piiinl in the school The second category includes indiv idual 
ntelhgence tests special aptitude tests pcrsonalitj tests and tests of voca 
tioinl intenst 


1 he folloii mg comprchensiv c • Platform for the Use of Standard Tests 
11 IS l>(H?ii prrpaml b\ a committee of Alassachusctts teachers * 


1 Scicntif r iiien«unnK in«!truTncnls and the scientific method arc ba llj needed n 
rire ont «!uc^tlonn^ practice ^o hu«mc-^ of the financial magniiiide of elu 
cilion «i)onds «50 little time and monc> for objective and scientific fact finding 
- St in hniued tc^ts and measurements can fulfil tl eir function of giving direc 
tion nti 1 c/FIciencv to cflucation onl> ohen u«ed intelfigentK b> teachers and 
ndmini^trators who have kept abreact of current knowledge on the subject 
and who arc willing to follow the authors directions for the administration 
and scoring of the te«ts u«od The results of tests in which directions are not 
followcrl arc worse than u«olcs3 thci misleading 
3 Pverv staiidarhrod lest admimsfered should be given for a specific ourpose 
and having been given its results «hould be u«cd Tests which are administered 
«corcfI and piled m a cupboard serve no u«cful purpose 
•f Stanchnlizod tests can be u«cd most cflicientlj when their use is planned over 
a long jicnod of time 

5 Standardized tests I avc furnished valuable information to the school adminis 
trator m pmcticallj eicrj instance in irhich they have been used The pos- 
sibilities of diagnostic tests tn improving instruction through analysis and 
diagnosis of individual and cla«s weakne'ses have not nearly been realized 
IcsU are of the greatest value when their results cause a teacher to redefine 
his objectives alter his methods and redirect his emphasis as a result of new 
increased and more exact knowledge about his pupils 
G If standardized test results arc to be u«ed id measuring the efficiency of i 
stniction the conditions of scientific experimentation must prevail with « 
rontnbutJnf^fjictorgdefjned measured and controlled Failure to observe these 
conditions often results in teaching for test results alone which not only in 
validates anj results which ma> be obtained but also neglects some of the 
most desirable outcomes of good teaching which cannot be measured bj tests 
On the other hand standardized test results cannot be ignored They can be 
of great help to an administrator in judging a teacher s work but thev cannol 
bo used as a substitute for classroom visiUng supervision and critical sub- 
jective analysis , , J , I u u 

7 ^o important decision regarding the placement of an individual pupil should 
be made on the basis of the result of one test of any kind Educational achieve- 
ment mentalage IQ chronological age health teachers judgment physical 
development social age and emotional maturity are all factors to be con 
wJered in indn idimJ placement or any plan for grouping 

^ Arthur E Traxler Planning and Admmistemg a Te tmg Program SMRmeui 

'*^1 ^The^Use^o "standardized Te-ta m Massachiisette Ttsi Senice Bullet n ^o SS 

pubJ shed by M or)d Book Toropany 1938 



212 


thl tlsting program 


S The content or items of a standardized test should ne\er be used as material 
for class presentation and drill either before or after the administration of the 
test To reproduce an} part of the test, either on paper or the blackboard, 
IS not only a \nolation of the publisher’s cop> nght, but ill invalidate that test 
for future use m the school For this reason, all copies of standardized tests 
should be accounted for, and extra copies should not ordinanl} be left m the 
hands of the classroom teacher 

9 The I Q or mental age obtained from one group test of intelligence is le«s 
reliable than an average of I Q 's or mental ages obtained from the results of 
too or more group tests of intelligence An individual test of intelligence is 
more valid and reliable than group tests onl> when it is admmistereil b> a 
skillful and vv ell trained psj chomelnst 

10 The use of standardized tests and a knowledge of the methods u«ed in their 
construction should result in an improvement m teacher-made measures of 
achievement - 

One must not assume that the testing orogram should be restricted to 
the use of standard tests As has been explained m the three preceding 
chapters in/orTnalortcac/ier-madc/cstsuiff have a large j)lace m anycomplele 
testing program Schools should have a carefullj thought out general policy 
on such matters as the frequency of testing, the importance of final exami- 
nations, the factors to be considered in determining final marks, and, most 
important of all, the uses to be made of the results 
Regardless of its scope, the complete testing program at any particular 
tune mil ordinarily consist of the following eight steps, or stages, in chron- 
ological order 

1 Determining the purpose of the program 

2 Selecting the appropnate lest or tests 

3 Admimstenng the tests 

4 Sconng the tests 

5 Analyzing and mterpreting the scores 

6 Applying the results 

7 Retesting to determine the success of the program 

8 Making suitable records and reports 

A. Determining the Purpose of the Program 

It must be recogmzed at all times that tests are only tools, and that 
measurement is alwajs a means to an end, never an end itself In the final 
analysis, then, the value of any testing program depends upon the use made 
of results Unless something is going to be done about it in the end, there 
IS no point to begmmng Merely "giving tests" without rhyme, rule, or 
reason is money, time, and effort wasted The author once heard an'ex 
penenced educator say that he had wondered for years what many people 
did with standard tests after they had been "given " At last he found out 
They filed them’ The testing proeram should have a more senous purpose 



213 


SI! rs AV Tllh limAG PIIOGRAM 


.‘'’'"'r’ ."''''’"'"B “ ■’ ‘o detenn^oe 

Its ptirpo'^c In *50 doing, three things should bt kept lu mmd 


1 It Khoiild In ro-operatuc 

2 It sliould be pracliral 

3 It sliould be definite 


A ,er.iln<. proRrani. As a rule, the program should not represent 
tflO judgment of nnj one person alone, but that of a group It should be a 
tnih cfvoporiti\ o entcrpn^c The teachers and admimstratue officers alike 
should be made to feel that it is “our** program, as, indeed, it should be 
I ills IS not hkcb to bo the case, howc\cr, if the pnncipal, supenntendent, 
or n «carcli department determines the program and then “hands it doum” 
to t!ic classroom teacliers Tlie entire staff should ha\e a voice in deter 
mining tlie puriiO'^e of the program and m formulating the plans, and all 
sliould have the opportunitj of participating in it in every v\ay possible 
from beginning to end If this is not done, the teachers arc not hkely to un 
dcr>tand (ho program fully or to appreciate nhat it is attempting to do 
\\ ithout the hearty cCKiporation of the entire staff, from the supenntendent 
to tlic youngest teacher, tlic program is almost sure to fall short of its 
highest possibilities It is suggested, therefore, that in the small school or 
Mchool ay stem the purpose of the program be decided upon after discussion 
in u general tcacliers' meeting or senes of meetings in which everyone has 
a chance to participate In the larger school sy stems it is better to entrust 
a committee rcprc«cnting all interested groups wath the responsibility of 
planning tlic program Even then it should be brought before the entire 
staff before final action is taken It cannot be emphasized too strongly 
that the success of the program largely depends upon co-operative action 
An important part of the program, therefore is the educating of the staff 
so that they tan participate intelligently in it Boyei emphasizes the fact 
that the teacher’s attitude is the most important factor to be considered 
in anv plan for “v\hat she thinks and uhat she does as a result of her 
thinking, determines the success or failure of the plan 

A practical program The general purpose of the testing program is 
to provide data vUiich mil help in the solution of some practical school 
problem As a rule, this means that the problem whose solution is sought 
will have to do with administration, wistructjon, or research or with some 
-onibination of these three Even when tests are used primarily for ad 
n/inistrative purposes such as classification, they can also be used by the 
classroom teachers for diagnostic purposes Unless the school has had con 
siderabJe experience with testing it will be better not to undertake a pr<> 
gram primarily for research, although under favorable conditions research 


*ThxrtyFxSih \ earhook of the I^ahffTuu boa^fm St^y of Educalion Pari / 
pngp 21J PJoomjnffton Illinois Public School Publishing Company 1936 



214 


THE TESTING PROGRAM 


IS a legitimate interest both of classroom teachers and of administrators 
Even when the program is undertaken for research purposes, it should 
ordmanly be one which bears directly upon some practical issue in the 
school, such as determining the relative efficiency of different teaching 
methods or of administrative organizations 

A definite program. It is not enough that the program be co-operati\e 
and practical It must also be definite The scope of the program may \arj 
all the way from a single subject in one grade to a complete measurement 
of the entire school sy stem A common mistake of a staff mexpenenced in 
the use of tests is to undertake too much The danger then is that the pro- 
gram will drag along until e\ejybody' js more or le^^s “fed-up” with it 
Much of the value of the information sought from the tests will be lost 
unless the information is made available without delay It is usually best, 
particularly with mexpencnced teachers, to run the nsk of undertaking too 
small a program rather than one too large 
Another mistake is m stating the purpose of the program m too general 
terms “To improve instruction” is too vague and inclusive “To motivate 
study” or “to diagnose weaknesses and provide a basis for remedial in- 
struction” would be better Best of all would be a still more definite formu- 
lation, such as “to motivate study m fifth-grade arithmetic” or “to make 
a diagnosis of characteristic weaknesses in first-y car algebra and to formu- 
late a program of remedial teaching to strengthen them ” The purpose 
should state specifically both the nature and the scope of the program to be 
undertaken Later chapters will discuss m some detail important admin- 
istrative and instructional problems which tests may help to solve In a 
long time program the purpose for each year will have a defimte relation- 
ship to the whole No matter how stated, however, there is really one 
fundamental purpose m all measurement namely , the better understanding 
of the individual pupil To accomplish this purpose the information must 
be as defimte and as complete as possible 

B. Selecting the Appropnate Test or Tests 

When the purpose of the testmg program has been determmed, and not 
until then, the selection of the test, or tests, is m order In Chapter 4 
attention was called to the fact that a test may be supenor for one purpose 
and worthless for another Great care must therefore be exercised m order 
to secure the tests most appropnate for the purpose Three questions re- 
qmre consideration 

1 Who shall select the test or tests’ 

2 What type of tests shall be used’ 

3 What IS the best procedure m making the selection’ 

Who shall select the tests? The best qualified person, or persons, 
available should make the selection In larger school systems the director 



STLPS IN THE TESTING PROGRAM 215 

of re.scirch is usunll> that poison But, even then, in the selection ot 
achicv ement tests for specitie subjects, the teachers of these subjects should 

JiAlv of ■" Judpng the curricular va 

hditi of the tests In smaller schools the major responsibility is usually 
cut rusted to the pnncipal or superintendent ' However m the selection of 
nfjijc; ement tests a committee of teachers ^ill be helpful in judging the 
content of the tests It is a sound principle m all evaluation that involves 
n subjective clement to relj, whenever possible, upon the combined judg 
ment of a group of competent persons rather than upon the judgment of 
nnj one individual 


\Shnl tvpc of tests shall I»c used’ Ordinarily an adequate testing 
program will mv olv e the U‘!C of more than one ty pe of test It w ill be desir 
able, except in a few cases such as in the begummg of the kindergarten or 
first grade, to use both intelligence and achicv ement tests If considerations 
of time and moncj make it advisable to limit the testing program to one 
standard test for determining the present status of the class or school the 
best choice ■mil usual)> be a lest batterj * 

For a genera! survej of the intellectual status of the class or school 
a good group test of intelligence mil suffice, although as a rule an average 
of two IS better than one alone In an^ measurement of intelligence mvolv 
ing group testa, ospccialli if onl> one test is used it is desirable to have 
retested with an individual intelhgcncc test, such as the Revised Stanford 
Ihnet or the \\ echsicr BcIIcvnio, the following pupils those who test verj 
low, sa> below an IQ of 80, those who test verj high say above an IQ 
of 130, or tho«e vvliose scores are considerably out of line with the judgment 
of the teacher The Rev isod Stanford Bioct i*; particularly trustworthy at 
the low IQ levels The distinctive advantage of the individual intelligence 
test IS the opportunity afTorded for the examiner to observe the behavior 
of the child under standardized conditions As a diagnostic instrument such 
a test IS hkelj to be much superior to the group test Pupils who have 
language difficulty should be tested individually perhaps wnth a perform 


ance test 

A reasonably complete testing program wall require a«« a rule the ust of 
intelligence tests along with achievement tests Because of the relative 
constancy of the IQ it is unnecessary to admimster intelhgence test- each 
year The mental level of most pupils can be predicted closely enough from 
intelligence tests periodically scheduled to permit ordinary comparisons 
with arfuevement Page 225 outlines intelligence testing programs adapted 

to various types of school organization Attunes aptitude tests m specific 

- * ' interviews and the like will also 


fields rating scales, check lists 


-statewide .utvey. la M.ssacbiisetU, aad New Jeney mdteate this elearly See Test 
Service BuUehne No SS rrui it World „,i„e,ted admm strator 

to have test caUloguee of the companies listed noth astenata in App v 



216 


lUE TESTING PROGRA^{ 


lABLE 


Advantaoes and Limitations ot Standardized 



Standardized 

ClUTERION 

Aamnlagea 

lAtmtationt 

I Validitj 
a Cumcular 

Careful selection bj compe- 
tent per’ons after expemnen 
tation 

Fit tjpical situations 

Inflemble Too general in scope 
to meet local requirements 
fully especially m unusual 
situations 

b Statistical 

With best testa high 

Criteria often defective Size 
of coefficients largely depend 
' ent upon range of ability m 
group tested 

2 Reliability 

With best tesla very high 
usually above 90 often 
i above 9o 

i Usually fully objective 

No guarantee of validity De- 
pends upon range of ability in 
group tested 

3 Xlsabilil) 1 

a Ease of Adimm«- 
tration ' 

Dehnite procedure timelim 
Its etc 

Economy of tune 

Manuals require study and 
are sometimes inadequate 

b Ea«e of Sconog 

Definite rales keys etc 
Largely routme 

May take considerable tune 
Monotonous 

c oT Interpre- 

tatioD 

■Better teste ‘have adequate 
norms Useful baaio of com 
pansou Equivalent forms 

Norms often confused with 
standards Some norms defec- 
tive Norms for various type? 
of schoulo and level* of ability 
are often lackmg 

Summary 

MamFrint" Proaud 
Cod 

I Convenience comparability 

1 objectivity 

I Equivalent forms may be 

1 available 

lofleNibility 


be required The particular combmation of measuring techniques required 
in anj given situation will depend upon the specific purposes to be served 
As a rule, classroom teachers will find a larger place for nonstaudardized 
teacher made tests in the solution of instructional problems than will 
school administrators m the solution of administrative problems The re- 








STEPS IN THE TESTING PROGRAM 


217 


27 

AND NoSoriSDARUIlED TesTS Of ACUIEmiEM 


>tO\STANDARO{£ED 


E! 

«at 

( oe/EcnvE 

/If/rantosw 

Ltmilaltons 

Adcanloqes 

Limilalions 

U<‘cfuUorr»Rli*li 

jidvnncw! 
afforil hfiRUaRC 
training Mfti cn 
couraRc sound 
study habit* 

I imitcd aamplmR 
Bluffing IS poxsi j 
bio Mix language 
factor in all scores 

1 

Extensive sampling 
of subject mailer 
Flexible m use Dis- 
courage bluffing 
Easier to prevent 
and to detect cbeat- 

Narrow sampling ol 
/unction® tested 
Negative learning 
possible 

Piecemeal study en 
couraged 


llauall) notkno«Ti ^ 

Compare* fivorably 
with standard tests 1 

1 

Adequate criteria 
usually lacking 

Inexperienced 
leather ina> do 
better tlian T'ntb 
objective tjpe^ 

A'crage is low 
Subjective foortog 

Sometimes sp 
proacbes that of 
standard tests 
’ Objective scoring ] 

No guarantee of va 
lidity 

1 

to prepare 
liisj to give 

lAcb cit uniform | 

i 

Directions rather | 
uniform 

EconotD> of time 

DifTiLult to ptpp^^e 

1 


Slow unoeitain j 

and subjective j 

Definite rules l^ejs 1 
etc j 

Largely routine ^ 

I May take consider 

1 able tme 
■ Monotonous 


^0 Dorias 1 

Jfeanicg doubtful 

j Local norm? can be j 
derived 1 

1 

i 

N 0 norma available at 
beginning 

Use/ul /or part 1 

many tests and m 

a feir special fields | 

1 

Limited sampling 

Subjective sconng 
Tunc consuming 

1 

Extensive sampling | 
Objective sconng i 

Flexibility 

Preparation requires 
skill and time 


conM,on «U ‘X lu— " t the etaef adv» 

sort of "balance ^heef' vement teecs It ce evcclea' 

equally good for all purposes 










218 


r/lB TESTING PROGRAM 


UTint !s the hi-sl procedure? Hcgardles? of the purpose of the testing 
program or who makes tlie selertimi of tests, it is important that a system- 
atic, businesslike prm crlure be employed. Users of standard tests nill find 
tlie information conlaincd in The Mental Meamrcmmls Yearbooks of great 
value. The comprehensive character of the tests reviewed in this publica- 

TABIE 2S 

CLASsmemos or Tests is The Fourth Mental Meaxuremenli Yearbook 
(1953) 


TEST I 

ACHIETOIF.NT BATTEUIES.. 1 I 
CHARACTEU AND PEIISON- j 

AlalTY I 

NoNPnoJEcn\x 27 

PECOEcnVB 102 

ENGLISH 1-18 

Coiiposrnos 178 

Literature 180 

Speuusg 19S 

Vocabulary 213 

HNE AllTS 

Art 219 

Music 225 

FOIIEIGN LANGUAGES 232 

English 233 

French 236 

Geeuan 244 

Greek 24S 

Italian 249 

Latin 250 

Spanu!!! 259 

INTELLIGENCE 

Group 267 

Indimdual 334 

MATHEMATICS 365 

Algebra 3S0 

Ap.iTHitmc 393 

Geoiietrt 422 

Trigonouettby 438 

MISCELLANEOUS 

Agriculture 441 

Business Education 443 

Computational and Scoring 

Devices 464 

EnQUETTE 471 

Handwriting 475 

Health 478 

Home ICcoNoincs 491 

Industrial Arts 503 

PHJUiSOl'IlT 505 

PsTCJUlLOGT 507 

Record and Report Forms 510 

Ueligiou!* Educ-atios 518 


test 


Sayett Education 521 

Testing Pp^gr-ams 526 

READING 523 

Miscellaneous 5Gl 

Oral 565 

Readiness 566 

Special Fields 573 

Study Skilw 57S 

SCIENCE oSO 

Biology 696 

Chemistry 007 

General Science 623 

Geology 630 

Miscellaneous 631 

Physics 033 

SENSORY-MOTOR 644 

Heari.no 646 

Motor. 649 

Vision 654 

SOCIAL STUDIES 662 

Economics 670 

GEOGRAPaT 674 

Histort 679 

Political Science 698 

Sociology 70S 

VOCATIONS 710 

Clerical 719 

Interests 736 

Manual Dexteritt 749 

Mechanical Abiutt 756 

Miscellaneous 777 

Specific Vocations 785 

accounting 787 

dentistrt 7S8 

DBIA-EES 789 

education 792 

enuiseerlnq 80S 

UAW 814 

M-ACmNLSTS 816 

medicine 817 

NURSING S18 

salesmen 824 



219 


S7J.PS IN THE TESTING PROGRAM 


tioii n ind.cit.'d bj tho “CiMsi/iration of Tests” in Th, ir , , 

Mcasurmenh 1 carbool , nhich is slioiin m Table 28’ ’ en a 

1,1 illiistritio,, of the Ijpeof cialualionsin thisiolume the follmiine 
CM cnits from rommonts on the Pnmaiy Jlcnfil Abilities tests are given " 

ei/e"','," PuT!’ f Pejdiologj at Fordham University, criti 

nzes the PM \ tests on the basis of the methods used to estimate their 
rclnbiljl} 

A Sjiccnl « ciknoss of tho entire PMA senes is the treatment of test rehab lav 
111 tests sucli as these deigned for intra indn idaal eomoansons and profile anslv sis 
the need for proper determination and reporting of reliability is oarticularlv urgent 
Act in the vinous forms of the PAU testa rehabilitj coeffieients are either in 
ndeqtiatclj reported incorrectlj computed or completely omitted Odd-e\enand 
Kuder RichauNon techniques ha\e been repcatedt) cmplojed m finding the reli 
abiljt} of ffpoedwJ tests [for irhich they are not emtablej In severaJ /orms no 
recognition is guen to this problem at all spunous and meaningless rebabilities 
ns high as OS being reported without comment except to saj that the rehabihties 
w ould probably be liighcr in more heterogeneous samples 


Ralph r Berdie, Professor of Psychology and Director of the Student 
Coun«cImg Bureau at tfic Unnorsitj of Minnesota, does not deliver a 
favorable final verdict 


In general one would expect these tests to be a great contribution to education 
and guidance That they have not been may be due either to the test itself or to 
the inadoquTtc follon up «ork that the authors or others have done It may be 
that m attempting to produce a test that requires relatively little tune or monev 
the publishers have sacrificed thoec very things that made the tests potentiallv 
valuable It is too bad that after such tests have been available for more than 
14 vears, one must still conclude that thwr principal u«cs are experimental 


lohiiB Carroll /Woemte ProfcssorofFducation at Harvard University, 
IS more complimentary 

The author^ arc undoubtedly on solid ground in their discussions of the \ erbal 
Mc.-imnR and Jtevoning /actors In all probability the statements which the 
authors make about the Number and the Space factors and their relevance to 
certain tvpcs of curricula and jobs will everituafly be substantiated m vaiiditv 
studies but tins is only the reviewers hunch 


Stuart A Courtis Professor Emeritus of Education at the Univereify of 
Michigan, starts off flatteringly but ends by questioning the value of alJ 
tests 


No tests this revnever has ever seen or used approach the PMA tests m the care 
and mgcnuitj evident in their canstmetion The aulhoni have very nisely broken 
aivay from conventional memory question response type of items In all tesk the 
exercises involve mental functioning in a^ion In other the 

manuals might well serve as models for all pobiKhere of tests to ™ 

reviewer, how ev er, rejects as impproprvUe or maUyJalsz both the statistical meth 


•Oscar K Euros (Editor) The Fourih Ifoifai Mmevremeola Yearbook page vii 
Highland Park New Jersey The Gryphon PresB 19o3 
I" Jbid pages 698-710 



,<>0 THE TESTING PHOGRAM 

T\BLE 20 

Ons ScAi-E TOR lUnso Stasdard Tests'* 

I antes of Tests 

Seale for Rating Tests 

Manual (5) 

Validity (15) 

Reliability (10) 

Reputation (o) 

Ease of Administration (Total 15) 

(a) Preparation (4) 

(b) Time limits (4) 

(c) Explanation needed (3) 

(d) Alternative forms (4) 

Ea“e of Scoring (Total 13) 

(a) Objectmtj (10) 

(b) Time required (3) 








221 


f,r/:/-s m thi: testing program 


r. i: Vernon, Professor of Educntionat Psychology m the Inshtute of 
i,/htcation of tlio University of London, seems favorably impressed He 
points out tint* 


'nim^Jono his cleirlj retreated from ha earlier opposition to "general” mtelh 
Ccncc He not onl) allows total scores to be calculated for each battery, but e\en 
proudes for tncir con\or=ion to IQ's 

1 bus there arc renew 3 of the PMA senes bj fiv e different persons w hich 
fill ton hrge douhle^olumn pages and together cover almost ali points of 
interest to nearlj anj prospective user of the tests Few of the other tests 
get this much coverage, many are reviewed by onlj a single person 
The Fourlh Yearbook also citca nine reviews of PMA tests in previous 
3 earhooks 

In the choice of standard tests it is alwa>s wise to have available for 
careful c.\ammation both the test blanks and the test manuals of all tests 
being considered Most countj and city school 63 stems will find it desirable 
to mamlam for such purposes up-to-date sample ("specimen”) sets of 
the more important tests published To assist in making the netessar} e\- 
nmmations and comparisons, the use of a rating scale ^v^]l be found helpful 
The first one publ^licd, prepared b3 Otis, is reproduced m Table 29 
A more anal3tical scale for evaluating achievement tests is that of Cole 
and von Horgersrodc, given in Table 30 
The use of these scales not onU directs attention to significant points 
but also gives some idea of the relative weight of the various items In the 
aiithore’ opinion, Cole and von Borgersrode assign too much weight to 
rcliabihtv, and both scales assign too little weight to vihdit), the most 
important quaht3 of an} raeasunng instrument Also, the relative weight 
assigriccl b} both the^e scales to what may be termed usability seems some- 
what heav3 TIic authors suggest a slight revision in weights and a re- 
grouping of sections IV, V, VI, and VII under the heading usability The 
major divn.iofi'^ and sufidni-ions, with revised weightings would then be 
lb follows 



Divtston 


Point* 

I 

I’relimmarv Infonnafioo 


50 

II 

Validity 

*^0 


A Curricular 




B Statistical 

20 

20 

III 

Reliability 

A More important points 

IE 



B Les« important points 


30 

IV 

Usability 

A Ease of administration 

B Ea^e of scoring 

IQ 

6 

IG 




C E.ise of interpretation 



D Miscellaneous 

Total 


lOJ 



222 


THE TES’lING PROGRAM 

TABLE 50 

CoLE-vov Bosqersrode Scaue fob RATi'fo Standardized Tests” 


I Preliminary Information 

1 Exact name of teal 

2 Name and position of author 

3 Name of publisher and nearest address 

4 Cost 

5 Date of copj right 

6 Purpose of test 

II Validity (2o) 

A. Curncular (15) 

1 Exact field or range of educational functions which test measures? 

2 Ages and grades for which intended? 

3 Criteria with which matenal was correlated? 

4 Do questions parallel good teaching procedures? 

5 How wide is sampling of unportaot topics? 

6 Bliat IS the social utility of questions? 

7 Is test claimed to be diagnostic? (If so, see 5, c, below) 

B Statistical (10) 

1 Correlated against what outside criteria? 

2 Size of coefficient of correlation? 

3 Size and representativeness of sampling? 

4 Proof of adequacy of items (such as statements as to expenmenial trj out of 
Items individually to determine that no large percentage is failed or passed 
by all pupils and that the items show a consistent mcrcase of percentages 
of successes with successive age or grade levels ) 

III Reliability (25) 

A. Most important pomts 

1 Correlated with what? 

2 Size and representativeness of sampling? 

3 Reliability coefficients 

4 The means of the distnbutions 

5 The standard deviations of the distributions 

6 Other similar statistics 

7 Intercorrelations 

B Les® important but desirable 

1 Order of giving various forms of test 

2 Is test reliable enough statistically for mdmdual measurement, or should it 
be used only for groups’ 

3 Evenness of scalmg (see 11, B, 4) 

4 Are pupils accustomed to this type of te^t? 

IV Ease of Admmistration (15) 

1 Manual of Directions (3) 

a How complete and simple is the manual? 
b Doe® manual control test conditions well? 
c Typographic makeup 


** Robert D 
Tests School 
tober 1928 


'A Scale for Rating Standardized 
Of tducalion Record of the VntoersUy of North Dakota, 14 11-15, Oc 



STEPS m TUI TI STING PROGRAM 


TABLE So {Cmhnued) 

Colj'o-. BoTOEns-ocE Scale tor Eativc Stalcasdiled Tests" 


I\ rj\«c of AdminisiraUon (15) (Conf) 

2 Simplicitj of administf^lion (9) 

ft Amount of cxpUnibon need«l for p jpila hi examiner/ 
h Are diroclfons to pupib clear defaifetf comprehensive? 
c is ftrrangement of tost convenient for pupils’ 
d Arc samples and forc-excrcisca given when needed? 

€ 7 imt needed for givmg? 

T Alternate forms (3) 
a Number 

b Lvidence of reliability 
c !• vidcnce of equivalency 


% o! Scoring (10) 

1 Degree of objoclmtj — purely objective or some judgment on part of examiner? 

2 Arc adequate directions giicn— clear equ d to all emergencies? 

3 Is scoring adjusted to sue of tc<»i? 

4 Time ne^ed to score one test 
C Simplicitj of procedure 

ft Nurolicr of procea«ea needed to get final score? 


VJ Ease of littcrpretstioo (20) 


} 


2 

3 

i 

5 


Norms (G) 

a Kind— age grade percentile standard score etc 
b Dens atioii — site and representativeness of sampling 
c Tentative arbitrary or expenmenta)? 
d For separate parts? 
c IIowcxire««cd? 

Is class record provided? (t) 

Are there provisions for graphing results? (1) 

Is interpretation of scores easy or bard? (2) 

Application of results (lOJ 

a. Are directions or suggestions given for application of results to benefit teach 


ing or administration? 

) Arc tests survey or diagnostic? 

: If diagnostic — 

(Ij Proof of diagnostic value? 

(2) \STiat principle or principles under! e construction’ 

(3) JJoir many different skills abihties or aspects of the subject are analyzed 

or measured? , , , „ , . 

(4) Does the analjsis of total subjects into unit abilities follow teaching 
practices? 

(51 Is the diagnosis individual or da««a— proof’ 

(6) Does the test demand tabulations of individual pupils errors to secure 
a diagnosis? 

(7) Is a remedial program provided or suggeatedT 


VIJ Miscellaneous (5) 


J Typography and makeup 

a Arrangement of printed matter 
b Legibility of type 
c Quality of paper 

d Are test blanks free from distractions norms 


directions to examiner 


etc 


? 



221 


THE JESTING PROGRAM 


TABLC 30 (CoiUif«t«n 

CoLE-vos BoBGEiwi'ot>r ScAix roR lUiisc STAMKROizm Tests 


\ El Ml ce/Ianeoui (5) (Cant) 

2 Is the time required for pmg as smill a- i« consistent with reliable measure- 
ment’ 

3 Is the cost in keeping with the mnoniit, fcope, and rehalnJitj of the result" 
jieided'’ 

4 Is good test service provided b 3 the publisher^ 

5 Kind of objectiv e questions 


A desirable procedure is to ha\c a group of at least three competent 
people, each independent of the others, look over all the tests being con- 
sidered, the manuals accompanjing them, and any evaluations available 
Each judge first compares the tests with reaped to validity , and records 
the jud^ent m points before considering an 3 thing else Ihen he goes on 
to rehahihty and makes a similar judgment on each test Finallj, he does 
the same for \isab 1 ht 3 This method will lend to produce greater agreement 
among the judges regarding tlie relatnc ran!s of the tests on the entena 
inthvidually After all, the total point score allowed a test is le«s important 
than the rating on the divisions separated 
hraphasizingthe close relationship between teaching and testing, Brown- 
ell suggests the following entena'* for evaluating tests 

1 Does the test elicit from the pupib the de« 5 ired types of mental processes’ 

2 Does the test enable the teacher to observe and analyze the thought processes 
which he back of the pupils answers? 

3 Does the test encourage the development of desirable stodj habits’ 

4 Does the test lead to improved instructional practice’ 

0 Does the teat foster wholesome relationships between teacher and pupiL’ 

In selecting a test for a given purpose, the grade level on which it is to 
be used must be giv en consideration Test pubhshers often suggest a con- 
siderable grade range m which the test tnaj be used But both test authors 
and pubh<jhers tend to be too optimistic concermng the range of usefulness 
of their tests For example, an intelligence test that is supposed to be suit- 
able lor grades three to eight maj be found to be too difficult for the third 
grade and too easj for the eighth The reader mil doubtless recall from a 
discussion in Chapter 4 that it has usually been found that a test has 
optimum discrimination for a group whose average corrected for-chance 
score is approvamatelj 50 per cent of the maximum score possible on the 
test It must be remembered, however, that the discriminating function of 
diagnostic and certain other specific tests is usually relativ el^ unimportant 

“ William A- Brownell ' Some Neglected Cntena for Evaluatmg Cla^'sroora Tests 
Nolional f-temeatary Pnnapal 16 48»-4ft2, Jijj, 1937 ’ 


STEPS !N THE lESTIA’G PROQRAU 22S 

C, Adniinisterjn#^ the Icats 

The nc\l step m the testing program is the administering of fie tests 
In o^er to insure that tin, is properly done, three ouestion? muS h' I 


1 Wicn shouhi the tests be administered? 

2 Wio should administer the tests? 

3 What IS the correct procedure to follon ? 

Hacli of 1110*50 quc*itions do'Jcrves careful consideration 
Vlhcn should the tests he iidmimslcrcd’ As problems concerning 
the use of intelligence tests differ <omc\\hat from those concerning the use 
of nchic\ oment tests alone, it is better to consider the tw oseparateJy When 
should intelligence tests he admimsfcred? There js genera} agreement that 

It IS not neccsenrj togne the «amc pupils mtelhgence tests cieo }ear, but 

there IS a)«o agreement that po-^sible fluctuations on group tests are great 
enough to warrant giung such tests more than once The fluctuations are 
hkolj to be most senous m the pnmarj grades “ A reasonable plan em 
plojed bj many school systems is to guc intelligence tests at transitional 
points in the pupil's scfioo! history \s Stoddard suggests “Intelligence is 
analogous to health, anj estimate of it should be rechccked cljse to the 
making of an important decision ’’•* Procedure nould therefoie vary ac 
cording to the school organization A su^ested minimum program is as 
folloA\s 


Tj/pe of Orgmitatton 

Six ^lx phn 
Seven five plan 
Fight four plan 
Six threothree phn 


Graatt &. (hit InUiUgence Tesfs 
First and »«th or seventh 
Fir«l and ecvcoth or eighth 
First and eighth or ninth 
First ettlb or seventh and ninth or t nth 


If possible, it viould benell to add to this mirnmum program a test at about 
the fourth grade and one at the end of the high school course 
There is some disagreement regarding the best time of 3 ear m which to 
gitc tfie intelligence tests Of course, if the tests are to have maximum 
value, their results must be made available at the veiy beginning of these 
transitional periods This means they should be given early in the first 
grade if the pupils haie had no previous kindergarten experience Since 
Updegraff'* found that for preschool children the reliability of the test is 


u rr MiWwvI M Allen Relationship between the Indices of Intelligence Denvpd 
Irn/Utaa'?n tadtLon In.c.hgence ta Ginde I an^e Sa,nc Test, for 

Grade IV ' Journal of Educational PsycMogy 06 2d2 2o6 April lyw 

•» George D Stoddard The Meaning of Intdh^ace page 94 Aew York The Mac 

“‘.’I'',? Determination of n Ileliable Intelhgence Quotient for the 

~ nnd ifnnmn, of GcnC.c « '52 .60 

September 1932 



220 


THE TESTING PROGRAM 


mcreased bj postpomng testing until too weebs after entrance to school, 
It may be ncll to avoid giv mg the test till the second or third n eek of school 
in the loner grades The later tests can be given either at the beginning of 
the transitional year or at the close of the year preceding There is a tend- 
ency to have tests tor college entrance administered in the high schools 
near the close of the semor year This is obviously necessary it such tests 
are to be used in counseling these seniors regarding the feasibility of con- 
tinuing their education There will usually be a ten pupils who will transfer 
into the system and who hare not had intelligence tests, and others m the 
system about whom teachers may feel senous doubt regarding the validity 
of the existing record 

The frequenej \uth \\hich achievement tests should be used will depend 
primarily upon the purpose they are to serve Most purposes, however, 
mil require at least two senes of tests administered at intervals of a se- 
mester or a year Most achievement tests have norms for the middle and 
the end of the year, but often for no other time When tests are given at 
these penods, comparisons with norms arc easiest There is also the fact 
that many studies have showm a considerable decline m knowledge at the 
end of the summer vacation This would seem to favor giving the tests 
at the end of the school year, when the pupils’ status is more normal 
A companson between the records made by pupils at the end of each of tw o 
successive years is usually more trustworthy than that between the begin- 
ning and end of one year 

There are some advantages in havnng the tests administered in the fall 
Almost always some pupils will enter the school for the first time and thoir 
btatus can best be determined by administering tests to all the pupils The 
teachers wnll then have the entire school year in which to remedy any defi- 
ciencies revealed lall testing also avoids the undesirable practice of cram- 
ming If too much emphasis is placed on “improvement" shown during 
the year, however, pupils may be tempted not to do their best on the first 
cenes of tests This w ould not be the case if progress is measured betw een 
two senes of tests administered at the end of the preceding year and at the 
end of the current school year 

This practice will also make it possible to have the information serve 
several purposes It can be used partially as a basis for determining pro- 
motion from the grade, for educational guidance and possibly for section 
mg the next grade There seems also no good reasonwhy an analjsisof the 
errors revealed cannot serve equally well as a basis for remedial teaching 
m the succeeding grade as if the new teacher had given the test at the 
Deginning of the year Of course, in some instances there might be con- 
siderable value in repeating the test at tlie beginning of the year in order 
to determine the effects of the summer vacation, apart from the better 
established weaknesses which were present when the vacation started 



iTi:ps /A- mi' TisriMG fuogram 227 

Morrovcr, ()>c amlj sis of crrore i, more trustn orthy o hcu based upon t« o 
sampling's of pcrform.itico tliati upon one ^ 

BhI'u ’‘I'""''' Iesl?OM.o,.slv, only competent persons 
tin O P It "01 alonjs an easy matter to 

loll r 1,0 ts really competent, l.onotcr In the case of indmdual tests of the 
btanforrl-lbnct tjiic, tins requirement means that only persons oho have 
m college classes should attempt to administer 
tlicrn Tlicre shoiild he at least one person in every school who is qualified 
10 (ji\e fuch tests \Uicn tests arc used for purposes of research, or ^hen 
they are used to compare one Rrade. class, or school with others, they should 
Ufu-illy he pivcn hy one person, or a small group of specially trained ex- 
aminers I3utin thcordirnrj testing program, empIo 5 ang group intelligence 
tesfi and achte\ emont tests, the regular classroom teachers should usuaUy 
administer the tests Most of them will welcome an opportumty to do so 
At the present time there seems no good reason for selecting a test whose 
administration is so di/T?cuIt as to he hejond the mastery of average teach- 
ers in the public schools The point of \iew of McCall seems eminently 
sound:” 


Many jears ago certain specialists sought to secure a monopoly of the pmilege 
of u«ing standard tests b> tnmg to persuade educators to regard the tests as 
po^c«'5mg certain mjstic properties A fen of us mlh Promethean tendencies 
set about taking thc<e «aercd cows awaj from the gods and gmng them to mortals 
Can teachers be entrusted with tests? If not, then teachers ought not to be trusted 
with 90 per cent of their present functions \\e now entrust them with tbe far more 
diHicult ta«k of teaching reading, creating concepts and buildmg ideals Let us not 
strain at a gnat when we have swallowed fifty elephants 


But it is well not to take the competency of the examiners for granted 
One of the best plans is to get the group of examiners together and demon- 
strate the administration of the tests to be used One way to do this is to 
give a demonstration with a regular class and to follow this by a discussion 
with the examiners of the procedure they have seen Another way is to 
administer the test to the examiners themselves This should be followed 
by a full discussion of the procedure mxolved It is usually well to suggest 
that after each examiner has studied the manual he try the procedure on 
some other person, such as a member of the family, or two teachers may 
try it out on each other If questions then anse they can be settled by a 
conference with the person in general charge of the program before the 
examiner rocs before his group actually U> administer the test It has been 
lound that, if such measures are taken, the regular classroom teachers can 
obtain praclirally the same results inth group teats as can be obtained by 
special examiners 

iMVA MrCII ,n n. rrs, MelUr. pnI.I»hBO by Bureau ot Pubheanoua, Teacher- 
College, Colmobia University. December Wd 



228 


the testing program 


What procedure should be followed? Although the procedure of 
admimstermg group intelligence te<ita and achievement tests is not beyond 
the mastery of classroom teachers and school administrators, some diffi- 
culties maj arise In fact Ligon« argues that good group testing is more 
difficult than individual testing In the first place, the conditions for the 
test must be favorable It is usually best to have tbt tests given in the 
familiar environment of the pupils’ omi classrooms rspecially is this true 
of 5 ounger children It is well always to have the tests given at regular 
class tune without permitting them to run over into lunch hour or play 
tune For the same reason it is desirable not to hav c tests just before or 
just after an important event, such as a holiday, a school party, or an 
athletic contest Precautions should be taken to avoid all unnecessary dis- 
tractions and interruptions dunng the progress of the test It is a good plan 
to hang on the outside of the classroom door a card wluch reads Tests 
Going On Please Do Not Disturb Pupils should be instructed to remove 
everything from the tops of their desks evvept two well-sharpened pencils 
and an eraser The examiner should also have ready a few extra pencils 
m case of an emergency All these things must be looked after in order to 
insure favorable w orkiug conditions for the test 
As a rule, anyone can administer a group test successfully who meets 
three requirements The first of these is the ability to read well Good 
silent reading is required for the mastery of the directions printed in the 
manual which accorapames the test Good oral reading ability ib required, 
for the directions to the pupils should be read, not recited from memory 
To undertake to giv e the test from memory is to run a senous nsk of leaving 
out some important word or phrase or of paraphrasing the directions in 
such a way as to change their meamng But the examiner should be so 
familiar with the manual that he can read the directions with his eyes off 
the page a good part of the time The directions should be read with proper 
emplmsis in a clear v oice just loud enough to be heard throughout the room 
The aim should be to make the meamng understood without arousing 
anxiety or excitement 

The second requirement for adfiiuubtenng a test is the abihty to keep 
time accurately If the test has a single lime limit of, say , tw enty minutes 
or more, it is probably preferable to time it with an ordinary pocket watch 
rather than a stop watch, since the latter may , upon occasion, be quite 
erratic When a pocket watch is used, set its bands to some convenient 
time such as the beginning of an hour and give the starting signal just as 
the second hand reaches 60 (which is also 0) It will usually help students 
and examinei alike to hive a clock m the room which shows every one the 
corrcf't tune Some testers use a special device known as an interval tuner 

The aim should be to keep the time to a second On most tests the signal 


»* Ernest M Ligon The Admunatration of Group Testa 
Mg%cal Meaiurement 2 387 399 October, 1942 


Educahonal and Psycho- 



STEPS IN HIE TESTING PEOGIiAM 229 


Tfsi t 

Tmio tp«i In-pin 
Tim allinnt 


Ur 

0 


Mm Sec 

0 0 
S 


Tudc* to iitop 


0 5 0 


l:\I)cnci.r«l cMtnincrs knmi timt it is ncicr safe to trust one’s memory 
lo keep tlio time A ^\nttf'll record must be made 
Iho third requirement for admiiiibtcniig a test is the abiJjt> to foJlou 
diixclions accuratch 1 he manuil should be foJJowed verbatim Nodem 
tioii \\lmt‘'Oc\ er is ponni‘5siblc To add anything to or to modify the direc- 
tiousinani wny menus tint it is no longer a standardized test Boynton^* 
guos tome interesting illiistrafions of unconscious cJues given by inevpe- 
rienrcfl cvnmincrs using the StnnfonJ Binct One cvaminer, for example, 
nlicn asking the meaning of the word “tap” in the vocabulary test began 
to tip on the table and uhen iiecame to tbeuord “eyelash” he looked the 
child straight in the rye and batted his eyes rapidly The norms are made 
on the flssunipf ion that a jirescnbcfl formula is to be used As a part of the 
proliminarj instructions pupils arc almost ah^a^s told not to ask any ques- 
tions after the test «tarts Occasionally a pupil forgets this instruction and 
liohls up Ins hand for a question The examiner should walk over to him 
and, if It IS a reading (eat or an intelligence test whose purpose is follownng 
directions, sliould sa> in n quiet \oicc, “Read it carefully and do just what 
it sajs ” If it la an ordinary achievement test and the pupil is concerned 
about wfierc to put his answer or some other point of mechanics that does 
not in\oI\e tlie answer to a question in the test or modify the directions 
alreadj gixcn it is permissible to set the pupil at ease ^Mthout causing 
disturbance Kelle> suggests this pnnciple m handling the child who is in 
trouble ‘ 1 he examiner should be free to say or do anything that does not 
disturb or dcla> pupils at work, that does not help the individual child in 
the thing in which he is being tested, and that does set him to work again 
after some foolish or trivial issue has troubled him Examples of pemus 
sible statementB are “Yes, you may change your response if you decide 
It IS urong ” “Just aork on the side of the sheet, you do not need scratch 
paper ” “When you have finished the first column go right on to the next 
one ” “No you must not go back to a test you have passed ” and the like 
But if the pupil asks the meamng or speUing of a word, or how to aasner 


-•PaulL Boynton /nteaymrr Us Mam/Mm end tPesuremenI pose. 270-277 

'iryTo* nlefSy” pa.etS Yonkers 

\X orl I Book. Companj 1927 



230 


THE TESTING PROGRAM 


a test item, the examiner should saj quietly “I cannot tell you Go on to 
the next one ” In case 0 / doubt, the examiner should err on the side of saying 
nothing While the test is m progress the examiner must be alert constantly 
to see that the pupils neither help nor hinder each other nor are distracted 
by external factors Ligon“ indicates the follomng requirements of good 
group testing “That all the subjects understand the instructions, that they 
all work throughout the assigned time at their optimum level of achieve- 
ment, that thej are m no way helped, hindered, or distracted by one an- 
other, that they do not quit trying or omit any section of the test, that 
examiners give instructions adequately and m a stimulating, effective tone 
of voice — not a dull bored monotone — and that proctors are observing 
ev ery movement of the group, stimulating lagging souls, inhibiting n ander- 
mg ejes, and detecting failure to follow instructions ” A test is more than 
a measunng devnee, it presents a standardized situation in which to observe 
pupil behavior Anj occurrence observed dunng the progress of the test 
that may throw light upon, the interpretation of the results should be care- 
fully recorded 

D. Scoring the Tests 

It is desirable to hav e the tests scored as quickly as possible and with 
the highest possible degree of accuracy As a rule, then, that b> stem is best 
which accompUshes these objectives with the minimum expenditure of 
money, tune, and energy There are two questions involved 

1 Who should score the tests^ 

2 What techmque should be used^ 

Who should score the tests? In actual practice, standard tests are 
scored by a vanetj of persons Sometimes, especially in larger sj stems, the 
work IS done by a clerical staff at a central bureau, or by the use of scoring 
madnines, the scoring may he contracted lor with some outside agency, 
such as the test publisher , sometimes it is done by advanced students under 
supervision, at other times the scormg is done by admimstrativ e officials, 
but the most common method seems to be to have the work done by the 
regular teachers Except in the larger systems where there is a bureau of 
research eqmpped with special facihties, the scormg is probably best done 
by tbe classroom teachers In that way not only can the work be done 
promptly, but the teachers can probably team somethmg of value about 
the types of errors made on tbe achievement tests But it is important to 
get the scormg done without produemg an unfavorable attitude toward it 
on the part of the teachers Some schools have foimd it very satisfactorj 
to dismiss classes at noon when the testing is in progress, so that the 
teachers c an devote the afternoon to the work of scormg This would seem 
** Ernest M Ligon op eit page 387 



SrrP.S W THF TESTING PROGRAM 23 ] 

an cfTtctu c aj of emphasizing the important fact that teaching and test 
.ng arc proocs.es that arc mtimatclj related ^ 

ultl be taken to assure a higj, degree of accuracj in sconng It must not 
^ nssumc<l that incrrlj bceau.e t)ic directions are clear, the kej complete 
thesepamftansncrshcctsncJldcsigned and the process entirelj objectuc 
perfect protection against errors is thcrebj afTorded Numerous studies 
gne abundant ciidencc to contradict this assumption The> re\eal ti\o 
distinct t\ pcs of errors in scoring eons/ani errors and innal/lc errors A com- 
mon example of the former t\ pc is misunderstanding the sconng directions 
for instance, hj counting omissions the same as errors, \\hen using the 
Fconng or correction formula Such errors are especiallv senous, because 
there 13 no po^sibilitj of thtir offsetting each other according to any so- 
called *‘lai\ of a\cragcs” \anablc error? on the other hand, sometimes 
tend to make the score too high and at other times too lovs WTnJe such 
errora ma\ do senous harm to indnidual pupils, they tend to cancel each 
other m group measures such as averages Examples of variable errors are 
errors resulting from carelessness errors m counting the scores, errors m 
entering the scores on the front of the test booklet or on the record sheet, 
and errors in adding up the total score Some of the most serious errors 
found Are not in marking the paper at all but m counting and in addition 
Clcarli , then, accuracy in sconng cannot be taken for granted What is 
to be done about it’ The first thing is to prevent the occurrence of errors 
nhcncvcrpoe.iblc The scorers must be hon to score the papers and 
not merely told h<m to do it The} should be given an opportunity to study 
the manual and the sconng kc>8 ^^Tieneier possible, an actual demonstra 
tion of sconng should follou It is a good idea, also to check carefud> the 
first fevs papers marked bj beginners to detect errors at the outset This 
procedure should reveal anj constant errors and the principal types of 
vanable errors It is always desirable to have each page or part of the test 
scored through all the papers m a set before going on to the second page 
or part of the test If the scorers work in groups, as is usually desirable 
each one can specialize m marking one part of the test, and pass the test 
w hen scored to the next scorer, who is specializing m marking the next part 
of the test This procedure will reduce the nsk of error and at the same tune 
mil increase the speed of sconng It is usual!,, an especially poor tecluuque 
to have one person read the answers whde the scored rnark the papers 
Th.s IS slow, because the slowest scorer sets the pace It also rncr-eases the 
risk of error, omng to the possibilities of losing the place or of failure to 
hear correctly Colored pencils are desirable Inexpenenced scorers shoid 
mark each item m the test being scored m some uniform maimer such as 
+ for corLt - for incorrect, and 0 for omitted items Expenenced scorers 
itiU save tie by marking only the incorrect and omitted items It is of 



232 


THE TESTING PROGRAM 


TEST 3 . WORD MEANING 

When two words mean the SAME, draw a line under "SAME.” 
When they mean the OPPOSITE, draw a line under "OPPOSITE.” 


f fall — drop same — opposite 

north — south same — opposite 

i' expel — retain same — opposite i 

2 comfort — console same — opposite 2 

3 waste— conserve same — opposite 3 

4 monotony — variety same — opposite 4 

5 quell- subdue same — opposite 5 

6 major— minor same — opposite 6 

7 boldness — audacity same — opposite 7 

8 exult — rejoice same — opposite 8 

9 prohibit — allow same — opposite 9 

10 debase — degrade same — opposite to 

IX recline — stand same — opposite ll 

12 approve — veto same — opposite iz 

13 amateur — expert same — opposite 13 

14 evade — shun same — opposite 14 

15 tart- acid same — opposite 15 

16 concede — deny .................... same'— opposite 16 O 

17 tonic — stimulant same — opposite 17 4. 

18 incite — quell same — opposite 18 — 

xg economy — frugality..... ' same — opposite ig 4. 

20 rash — prudent,.,..,,,... same — opposite ao + 

21 obtuse — acute same — opposite 2t O 

22 transient — permanent,. same — opposite 22 O 

23 expel — eject same — opposite 23 

24 hoax — deception * same — opposite 24 Q 

25 docile — submissive ............. .. same — opposite 25 4- 

2$ wax — wantf ..................... same — opposite 26 

27 incite — instigate.; same — opposite 27 

28 reverence — veneration same — opposite 28 

29 asset — liability same — opposite 29 

30 appease — placate same — opposite 30 


Rlsftl . , I E» . . . Wrong . . S. . . . ^core . . 1 . 0 . . . . 


ritnirc 6. An lllusiration of the rro.-«liiro Followed in S.-orine Test 3 of the 'IVrn.an 
(.roup Test of MonUl Al.ilitj, Form A. (rVipyright by \\ ,rld BodIv CoiniJ.iin.) 


+++++ 1 +1 0+ ++ 1+ I 




Figure 7 A Sample Standard Test Scoring Becord 


course, unnecessary tu mark the items below the last one the pupil at- 
tempts But it IS well to draw a faonzontal ime across the test under the 
last item attempted Figure 6 illustrates the scoring of an altematue- 
response test of w ord meaning, using the formula Score ^ — IF 

The m-iters have found that the simple device of keeping a written record 
of who marks, checks, transcribes, or totals each part of the test reduces 
the likelihood of error If the sconng xs organized systematically, it i« u 







234 


THE TESTING PROGRAM 


simple matter to keep such a record on a mimeographed sheet attached to 
each package of tests uhen scored, as shoun m Figure 7 
But in spite of the=e preventive measures, certain errors are likely to 
occur The safest plan therefore, is to have each set of papers m irked 
a second time by different scorers, using pencils of a different color Dun- 
lap“ found that items most subject to errors in scoring arc of the two- 
response tv^ie requiring a scoring formula and items requiring the un- 
derhnmg of more than one word If a complete re^conng does not «cem 
practical a sampling method mav be followed Each fifth or tenth paper, for 
example, maj be selected and carefully rescored, and if onlj an occa‘*ional 
minor error is found, the whole set maj be safeh accepted On the other 
hand if frequent or serious errors are found in these sample papers, the 
entire set should be rescored In anj event it is important to have some 
person other than the original scorer check the totals for each part of the 
test and for the whole test, all substitutions m the scoring formulas, all 
transcribing of scores, and all transmuting of point scores into denved 
scores “ It is possible to locate man> serious errors bj examining closel> 
the profile of each individual pupil on all tests with this form of record 
-kny score much higher or much lower than the general level is suspicious 
Mso when two or more tests are used which purport to measure the same 
function, any seuous discrepancies should be scrutinized, on the supposi 
tion that a high positive correlation is to be expected The standard of 
absolute accuracy should be accepted by all scorers The Tosstbtltlies of 
•ien&us injustice to individual pupils by errors in scoring should be fully 
reiogntzed 

E Anal>zing and Interpreting the Scores 

\fter the tests have been scored and checked, the next step is the analysis 
and interpretation of the results Both processes go on together, for analj- 
sis IS w orthless w ilhovit interpretation and interpretation is impossible with- 
out analj sis Analysis is of two mam t^Ties statistical and graphical Before 
either can be undertaken however, there is the important preliminary step 
of classification and tabulation An analysis of errors appearing in the test 
papers is usually of major importance to the classroom teacher Chapters 3 
9 and 10 are concerned v\ith a discussion of the whole problem of analysis 
and interpretation, only an outline wall be given here to indicate the steps 
involved 

1 Classification and tabulation of stores 

2 Statistical analysis of scores 

« J-xck W Dunlap ‘The Helatioi ship Between the Tj-pc of Question anti Scoring 
1 rrors Journal of Expenr lailal Ld tcation b 7b-37J March UJ8 

« Denved scores arc obtaii cd from t-al les of nonn-* Each point score la expreved m 
some equivalei t unit such as uii age or pi rceolile score The mtcrpreUUoii of these 
1 Ills IS consi lered in Cliaptcr !0 



STLPS M’ run TUSTING PROGRAM 

nnaljsis anti representation 

4 Use of norms nncl stambriJs 

5 AlJ/l)vaJ‘4 of i Mors 


In n comp cle (c,(,„g program nil file of those steps will receive alton 
tion, allhoiigh not nlnass to the same tlogreo If th. primary purpose of the 
test ng program is diagnosis, for CMnipIc, the fourth step woufd be reb- 
tn el, unimportant and the fifth step relativelj important The revise 
« ould be tnie of a program whose mam olijectn e is a stud} of the compara- 
u\c cllincnc^ of \jrjous grades, classes, and schoo/s 


r. Apfdjiog the 

The nppliratmn of the results is the enn of the whole testing program 
n^er.Ulung thnt Ins gone before is reaflj prefmnnan \Vhitc\er value the 
tests are to Iiavc depends m the lust nimbus upon the use raade of the 
results 

Just w Imt IS to be done of cour-'C, depends upon the purpose of tlie pro 
gmm biter clinptcrs will consider in some detail the procedure to be fo) 
lowed for scirra) admiiusiratno and inslmcUonal problems It will be 
sti/Ticieut at (Ins point to give some idia of liow the procedure will vary 
with llic purpo«e 

Suppose, for example, that the purpose of the tests is to determine the 
present status of a pnrticulnr school with the idea of its improvement, and 
that the tout data arc before the pnncipa) The question now is, what is 
to bo dono^ Upon the basis of the test scores and other pertinent data such 
ns (he teachers’ chlimatcs, health reports, age-grade status and the like, 
several pupils are given trial promotions to the next higher grades A small 
group of pupils, wliose acliieveinent and intelligence scores are well below 
the central tendency of their respective grades are organised into an un- 
graded class and put in charge of a teacher whose outstanding virtues are 
sympathy, patience, and common sense Ability groups are also organired 
m a few grades and classes, with appropriate differentiation in curricula 
and methods 

Likewise, suppose the primary purpose of the testing program is to deter- 
mine whether or not the teaching emphasis is correct in the various sub- 
jects in the grades and, when the test results are in it is apparent that most 
of the grades are strong m arithmetic and spelling about normal in reading, 
and weak in language and the social studies Now what is to be done here^ 
The principal calls the teachers together and presents the situation in tables 
and graphs, with suitable comments by way of interpretation Then follows 
a regular “council of war “ One or more committees are appointed to make 
a special study of the Mtuation and to make recommendations at a meeting 
to be held a htfle later Lventually, after diuussion and deliberation, £ 
course of action n, decided upon, looking to the improi cment of the situa 
lion lu the wpjkt r bubjei ts 



236 


THE TESTING PROGRAM 


The procedure will again be somewhat different in e'^senbal respects if 
the pnmarj purpose is diagnosis and remedial work m reading Here tho 
test results should be analjzed m some detail m each prad« An analysis 
of the test papers, item by item, is often ver> re\ealing Special effon 
should be made to locate the specific nature of the rending difficulties 
There may be found some general weaknesses, such as the inabilitj to use 
the mdex and table of contents m a book, or possiblj to locate the central 
idea m a paragraph There are usually, in addition, other w eaknesses, w Inch 
appear m certain pupils and not in others Some of these w ill not be rc\ ealed 
at all by the usual paper and pencil reading tests, but will require special 
tools and techniques After considering these facts, the staff wnll trj to plan 
a remedial program to be follow ed dunng the j car 
The essential point in all these cases is that something is done about the 
situation reiealed by the test scores To fail to apply the results in some piac- 
tical way is to fad m the te^^ting program 

G Retesting to Determine the Siicces** of the Program 
Most testing programs stop with applying the results, if, indeed, thev 
go that far But an essential step >et remains After a reasonable time has 
been allowed for a trial of the remedial measures which were agreed upun 
in the light o! the test data, a checkup should be made to determine the 
success of this program Most tests are not sufficientlj accurate to reveal 
progress o\er a shorter penod than one half >ear As a rule, a second form 
of the test or tests u«ed m the beginiung should be emploj ed in retesting 
If this IS not done, it will usually be ^crJ difficult to express the result® 
m terms sufficiently comparable to make an accurate measure of progress 
possible Of course not all the gam found can be correctly attributed solelj 
to the remedial progTara Some of it is doubtless due to the practice effect 
or to familiarity with the test itself, part of it to teaching received outside 
the school, and part of it to natural growth Often, however, the improve- 
ment will be so marked as to indicate bej ond a reasonable doubt the effec- 
tiveness of the program attempted At other times the improvement will be 
disappomtingly small It is then usually wise to modify the remedial pro- 
gram m the light of the results obtamed 
The essential point is that the success of the remedial program must not 
be taken for granted On the contrary, a definite effort must be made to 
check upon its effectiveness To fail to do this is to leave the testmg pro- 
gram incomplete There is no better reason for taking the efficiency of the 
remedial program on faith than there was for taking the earher results 
of tea clung on faith 

11 Making Suitable Records and Reports 

Certain records and reports are essential to the success f the testing 
p-ogram But by no means do all these records and reports come /'hrono- 



steps IX THE TESTIXG PROGKAM 


237 


higiMlIy at the end of tl.e program .is a matter of fact, some of these are 
s.ciitial to the Inst three stages already discussed 

« at the tests slm« • the pupils, the teachers, the administrative officers, 
ul tlio parents or public The nature of the report \\ill naturally vary 
fcomculnt with the proiip to whom it is made, and the nature of the record 
mth the specific function it is to serve However, regardless of the type 
01 record or its specific function m any particular situation, its general 
function IS always, ns has been well stated bj Stcnqinst, “to present test 
results and related information in sucfi a meaningful aay as to arouse 
interest and aclton, on llie part of teachers, principal®, anpenators directors 
of special diMsions, and superintendents ”** 

Hi port to piipiU, The pupils ha\c a right to know »h. ir pcrfonnance 
on all ncbic\cment tests wbetberalandarehrcelornonstandnnhml In manv 
cases It IS well to go o\cr the papers with the pupil- m order to nuint out 
llic nature of the errors? made The siicce'is of imy remedial program wall 
depend upon the pupils’ cooperation Ixing ago Tliorndikc stated tlip mat' 
tcrsuccinclls in those word-^ ** 


Hio final justification for o\cr\ testing recun*' rtst® m Mar> Jones and Jutm 
Smith, and it therefore l)cliyo\cs all i*crsons who arc linking and giving testa to 
take tlicm into p irtncr«lup as soon iiul as comoI> teh is h feasible 


It 18 ti^suallj considered dangerous to present ttie results of intelligence 
testa to pupil*' And tficre doubtless is more possibility of harm than of 
good in making knorni the mental ages and mtelhgenco quotients of mdi- 
vidua! pupils Difficult) is most likely to result fiom scores at the extremes 
of the distribution Both pupils and parents can reconcile themselves to low 
scores on achiev cmciil tests, for that • an he cxpl.iintd to their satisfactinn 
on the ground that it is tlip schoore fault But low intelligence test srorp«j 
seem to reflect dirc'’tly upon the good name of the family, and this is re- 
sented Onl) Ific exceptional pupd or parent has a fine enough philiteopli) 
of life to reconcile himself to the realities implied by a low score, and to 
resolve to make the most of it There is also danger that the pupils with 
high test scores wall be so inordinately puffed up as to endanger botli their 
social standing with their fellows and their acadenue Ktanding wath their 
teachers 1 here are, however, special *ases m which information regarding 
intelligence -cores may properly be given Some example- of these ca.se- 
will be disf ussed ui later chapters 


page 518 Quoted by pcrmi<!sion of Wie s , 

.nJ Tl»- y-to. CoUe,e ^r, 03 H, 

October, 1924 




241 











2^ 


riKiiro 10. A CumulaUvo Record in Graphical Form. 


STEPS IK THE TESTINO PROGRAM 243 

«f pro-rr^c mode A di'iUngiMslung feotiire of th,s reeord „ tf,. 

IhcZZt "’ J*- the porceoUc ranks 

Such rcronls, alflmuRh cisj to interpret are KOmewhat laborious to pre- 
pare Hanj^en found that the graphical rcconl required tuaee as mLh 
time to prep ire as tbc numcrual rccoial and uas snmeuhat more subject 
to error This is perhaps largely responsible for the discoveiy quite a few 
' ^?r *1?? ^ schools Inch held membership 
in the 1 ducational Records Bureau mailc a graph of all test results ” Care 
must be e\crei«:cd that such records do not become ends m themselves and 
that not so much time is devotwl to their preparation that none remains 
for their pncticd use Schools with limited resources and little clcncal 
assistance sliould bo content with less elaborate record systems than those 
which m ij he feasible for larger and wealthier schools 

\^«rniiigs concerning profiles Because of thi ir deceptive simplicity, 
profiles unite improper interprctaUons ihree precautions must be kept 
in mind 

1 Eccia point on an inchndual s profile should be based upon the same norm 
group as c\cr\ other point, or at least upon highb similar groups It is dangerous 
and misleading for cvaniple to hacc the percentile point representing the pupil s 
score on a clonral aptitude test determined b> comparison with the scores of 
“emplo>ed male*! ’ while the percentile point for his mechanical aptitude test score 
IS based u|)on ‘ engineering college freshmen ’ Similarly the student taking both 
Latin and the rcquirc<i course in English is usually competing with rather select 
students in the first class If he has a percentile rank of 50 (exactly average) m 
Latin and 75 in I nplish this does not necessarily mem that his Lnonledge of 
English IS snpenor to that of Latin The tjpical student in the Latin class may 
ha\e a percentile rank of 7o m English when compared \nth all students m the 
English class This problem does not arise with a battery of tests that have all 
been standardizeil upon the same group It is roimmized when some procedure for 
obtaining cquualcnt scores is aiailable 

2 Since differences between scores are less reliable than the separate scores 

Ihemsel i es onlj large discrepancies m the individual s profile should be interpreted 
as indicating better achieiement in one area than in the other Many teachers try 
to use for diagnostic purposes slight or moderate differences that may be due 
entireb to chance ith a group profile based upon average scorc= m a school grade 
of even as many as 30 students however the differences do not need to be as large 
m onler to be both statistically md edaciUomli} significant for averages are much 
more reliable than the scores from which they are obtained , ,, , j 

3 Percentile ranks do not form an equal unit scale so they should be spread 
out at both ends and condensed in the middle when being used as the basis for 
points in a profile See Eiture 37 on page 297 


.. A ma,ter , them, .ummanted m Ch,rlce C Peter, (rd.te) A iM 
Records Bullehn 25 8-9 1938 



244 


THE TESTING PROGRAM 



Figure 11. Centfle Sheet for College Men and Women m Allport, Lindzey, and 
‘Vemon, A Sti^y erf "Ba'ed upon th® Norms (851 Jfen, 9K ITomen) m the 

Manual of IhrecUona (Boston Houghton Miffluj Company , 1951) The S« Scores of 
John Doe are Plotted 


Figure 11 illustrates the aboi e three precautions and also shows how sex 
differences can be taken into account ** “Study of Values” scores are shown 
for a certain college sophomore who secured 67 pomts on the theoretical 
scale, which is aboi e the 99th (i>er)centile, and onlj' 20 points for the eco- 
nomic \ alue C^earlj , this man’s dominant ^ alue among the six is the- 
oretical MTiether he actuallj’ has less of the economic attitude than of 
the political cannot be decided from this profile, since these two centiles 
(1 and 3) are negbgibly different Likewise, it is hazardous to say that he 
IS less religious than aesthetic, or less aesthetic than social, for these three 

An explanation of the ba«u for this table is contained in Julian C Stanley, "Study 
oj T aluen Profiles Adjusted for Sex and VanabOitj Differences,” Journal of Apphed 
Ftyeholoyy, 37 472— i73, December, 1S53 








srnrs /x rm: testing progha 1 / 21 -, 

vducs nil occur rather close together near the m.ddle of the profile There- 

T'' f" “y “'y that he ,s 

h ghost on theoretical and social, loncst on economic and political and 
not particularly high or low on religious and aesthetic 
ncporls to parents or ptihlic. Only a few schools make a systematic 
clTort to keep the public informed regarding the educational progress of its 
schools licsiilts of the testing program might lery iiell be summarized 
before the Parent-Teacher Association, noraen’s clubs, luncheon clubs, and 
similar organizations Slides and charts, illustrating the nature of the tests, 
Mith anal>sis and interpretation of the records of tjpical pupils, would be 
jnstructuo The cumuJative record cards are naturally of great \aluc in 
conferences uith parents regarding the educational program of their chil- 
dren Ililkcrt” points out clearly hov. this may be done Unless parents, 
as ncll as teachers, administrators, and students, participate in the inter- 
pretation of test results and the planning of action based at least partly 
upon them, the testing program will not be optimally cffectj\e A further 
discussion of the use of measurement in programs of public relations is 
gi\cn in Chapter 10 

Selected Referescbs fob Fubtheb Readivo 

Bennett, George K , Seashore, Harold G , and Wesman, Alexander G , A Manual 
for Ihe iDi/frcn/ioI Aptxlude Tests (Second Edition) New York The Psj chological 
Corporation, 1932 77 pages 

Buros, Oscar K (Editor), The Fourth hfental Measurements Yearbook Highland 
Park, New Jer«e> Crjphon Press, 1053 1163 pages 
Buros, 0<!carK (Editor), Succeeding il/entalilfcimremenw Kcarfcoois published bj 
the author, Highland Park, Ntw Jereey 

Coleman, W illiam, Test I^esulls for Cumculum Study Annual Report of the Tennessee 
Slate Testing Program, 1950-51 Nashville, Tennessee Tennessee State Depart- 
ment of Education, 1951 18 pages 

Coleman, William, and Cobb, E \\,The Guidance Vse of Test Results Knoxville 
Tennessee The Tennessee State Testing Program, University of Tennessee 
November, 1951 47 pages 

Cronbueh, Lee J, Eismtiali cf PsaAmgical Tesluv New York Harper A 
Brothers, 1949 Chapters 4 and 5, "How to Choose Tests and How to Oive 

ri Jnagln, John C , Adkins, Dorothy, and Cadwcll, Dorothj H B , Uai„ Deu^ 
meals in Eiaminm Methods ChicoBO Civil Scrvicii Assembly, 1313 East 60th 
Street, November, 1950 24 pages n ,, t, , ri 

Jordan, A M Measurement m Eduealum S' , 

pnny, 1953 Chapter 4, "The Testing Prognim-Aehievement-Test Batteries 

Linduuist E F (Editor) A— 

b^lf W Vaiihfl “A^^^ »d Scoring the Obicetive Test." 

..Robert N flilkert. “Pareat. and CumuIaUvc Record,’ Sar»I.™«. ttecord 21 
172-183 Supplement No 13 January 1940 



246 


the testing program 


bj Arthur E Trailer, “Reproducing the Uest,” b} Geraldine Spaulding, and 
“Performance Tests of Educational Achie\enient,” bj Daud G Rjans and 
Js Orman Fredenksen 

Jvational Committee on Cumulatne Records, Handbook of Cumulaltie Records 
^a'hmgton, D C U S Office of Education, 1041 104 pages 
Stephen'on, William, Testing School CAiWrcn, An Essay in Educational and Socvd 
Psychology >»ew York I/mgmans, Green and Companj, 1949 Chapter IX, 
“Prmciples and Practice of Selection ” 

Super, Donald E Appraising Vocational Fitness by Means of Psychological Tests 
fsew York Harper, 1949 Chapters IV and V, “Tlie Nature of Aptitudes and 
Aptitude Tests” and ‘ Test Administration and Scoring ” 

Traxler, Arthur E , Jacobs, Robert, Seloier, Margaret, and Townsend, Agatha, 
IntrcAuctwn to Testing and the Use of Test Results tn Public Schools New York 
Harper d. Brothers, 1953 113 pages 

"Worcester, D A , "A ilisuse of Group Tests of Intelligence m the School," Edu- 
cctumal and Psychological Measurement, 7 779-781, Wmter, 1947 
World Book Company, 1 onkers on-Hudson Kew I ork The following free pubhca- 
tions, most of them reonnts of articles from professional journals, maj be of 
mterest m connection with this chapter 

Durost, Walter N , “Wliat Constitutes a Minimal School Testing Program," 
Test Senice Notebook No 1 

Durost, Walter N , ‘ Tests and the Junior High School Guidance Coun«elor," 
Test Semce Notebook No 2 

Lewm, LiUie, "Pupil Adjustment Through Measurement " Test Service Bulle- 
tm No 40 

Burnside, Carolj-n J, “Imoro^^Dg the Reading o' Pe>enth Graders,” Test 
Ser\ ice Bulletin No 44 

Super, Donald E , “The Place of Aptitude Testmg m the Pubhc Schools," 
Test Ser\nce Bulletin No 49 

Starkej, Mar} L “Determmmg Individual Needs and Capacities Through 
Testmg," Test Service Bulletm No 56 

Stenquist, John L , “Growth," Test Service Bulletm No 59 

Bridges, Claude F , “Some Basic Considerations m Determmmg the Signifi- 
cance of Achiev ement Test Results," Test Service Bulletm No 66 

Brown, Woodrow A , “Testing m Pennsj Kama’s Public Kmdergartens,” Test 
Service Bulletm No 67 



9 


The Graphical Representation of 
Educational Data 


At The Value of Craphs 

“One picture is worth ten thousand words So runs an old Chinese 
pro\ erb “There is a magic ui graphs," saj s a modem writer * He de^enbes 
the dynamic role of the graphical representation of numcneal data as 
follows 

Words Iia\e wmgs, but graplw interpret Graphs are pure quantity stnpped of 
\ erbal sham, reduced to dimension, vi\ id, unescapable WTierever there are 
data to record, inferences to draw, or facts to tell, graphs furnish the unnvafed 
means who«e power ne are just b^iomng to realire and to sppb 

There can be little doubt that the graphical representation of educa- 
tional data is a \aluable supplement to statistical analysis and summanza- 
tion The psychological value of graphs in the testing program may be 
considered under three headings They attract attention, they clarify the 
meaning, and they aid retention 

Graphs attract attention. In the first place, the graph or chart tends 
to attract the reader's attention Advertisers employ a wide vanety of 
pictures, charts, and diagrams, for they realize that the first step in making 
a sale is to attract the prospective customer’s attention They liave learned 
that pictures wll do this where numerical data and printed matenal will 
not The average reader is likely to gne scant attention to the ordinary 
printed matter in a school report and be wholly unimpressed by the ap- 

‘ Henry D Hubbard quoted by W C Bnnton Graphic Presentahan page 2 New 
York Bnnton Associates 1939 


247 



BACK TO SCHOOL 

UNITED STATES, 1900 - 1953 
ENROllMENT 


] HIGHER EDUCATION 
i SECONDARY SCHOOLS 





GtlAPlllCAL RlPRESEtVTATION OF DATA 


palling ma^s of tabular data often piled up at the end, but his eye is hkeh 

few I ”” There is evidence that 

school administrators are beginning to learn this lesson > 

Grnidia elnril-j points. In the second place, the graph is often an efiee 
tn e method of clarifj mg a point One small chart n .11 often make a point 
clearer than a dozen tables or paragraphs It is sometimes said that the 
facts speak for thcmsclt cs In realilj , statistics often stand speechless and 
silent, tables arc tongue tied and onls the chart cries aloud its message 
toallthenorld Ordinary mimcncal data are quite abstract they convey 
their meaning t agilely and ivith effort to the average mind The picture 
or Knpfi IS a more conrreto representation of the matter 


Educitiom! fact« such as eomp-iratne enrollment figures o\era 53 year 
period, maj bo presented cfTcctivelj bj graphical means, as sho^m in 
Figure 12 There both bar and pic charU are used to contrast the 17 2 mil 
lion persons enrolled m five tjpes of schools in 3000 inth the 34 4 milhon 
enrolled m 1953 TIic bar charts show actual numners, while the two pie 
charts hi\c percentage slices 

Figure 13 represents strikingly the enormous inequalities among states 
in the support of public education * 

A wide %anct) of charts arc shown m Figure 14 The basic information 
concerning "Afotor Fuses m Operation in the United Stales'^ is given first 
in tabular form, followed by 15 different black and white charts * 

Graphs aid retention It has been found that the graphical presenta 
tion of certain types of data is a definite aid to recall Washburoe® com 
pared the efficiency of graphical, tabular, and textual modes of presenting 
111*51011011 data to pupils in tlie junior nigh school The material which 
dealt with certain specific quantitative facts, was kept constant but the 
mode of presentation laried Sometimes it appeared as a statistical table 
sometimes as a bar graph, a pictograpfa, or a line graph and at other times 
it was presented in ordinary paragraph form Among the conclusions ar 
nved at by the author were the following • 


1 The paragraph is in general the form which is least favorable to recall of 
quantitatue data whether general or specific 

2 The bar graph is the form most favorable to the recall of relative amounts 
(static compansons) wnen the coinpansons called for involve a fair degree of dift 


’ CT Douglas r Scales Reporting Sumnianzing and Supplementing Fducational 
Research /fetaeic 0 / kdncolional Rescoreft 12 5a8-574 December 1942 

■JohnK Nortonaod Eugenes Lawler UnAmdud Businas in Ammcan Sducahoji 

page 13 Washington American Louncil on raueation 1946 „.rhsrt,sn 

•This material appears on the 6rst two l^de pages of Mary Eleanor Spear ■ Chnrlin, 

New York McGraw Hill Book Cbmptny Ine 1952 ^3 i^ges 

^ • John Noble VVashbumo An Eaperanentel Study of Various 
Textual Methods of Presenting QuantitnUye Material Jaumal nj Educnhaml Psss'io 
ogy 18 361-476 September and October 1927 
• Ibtd page 475 



ian Classroom Unit 



250 


S780O 



251 


»r Complp, B«t Gmph mlh High Attention 







THE INDEX CHART 


THE LOGARITHMIC CHART 



THE SUBDIVIDED COLUMN 


THE PAIRED BAR 



THE BAR And SYMBOL 



THE COLUMN and CURVE 



r 


E PICTORIAL SURFACE 


THE PIC CHART 



L M«rv ne»mr Spear Cepyr.eM, 1852 
permission from Ctoi.w CoSi«»ny, tnc 


254 


THE TESTING PROGRAM 


culty For verj simple data some form of picU^raph maj be more favorable to the 

recall of relatnc amounts than the bar graph „ , ,x 

3 The Ime graph is the form most favorable to the recall of relati\ e increase 
decrease, and fluctuation (djmamic comparisons) 

4 The statistical table is the form most favorable to the recall of specinc 
amounts 

One study'^ on graph interpretation in the elementary schools points out 
that little IS known regarding the comparative value of various graphs, 
although the circle graph appears to be easiest and the line graph most 
difficult, with the bar graph occupying an intermediate position Her results 
indicated that a mental age of 14 years was required for the satisfactory 
interpretation of bar and line graphs without specific instruction in reading 
matenals presented in graphical fonn 

These findings appear to he in line with a principle of learning abun- 
dantly supported by experimental evidence namely, that the method of 
presentation which makes the meaning clearest is most favorable to learn- 
ing and recall It is important to recognize that neither statistical nor 
graphical methods bestow precision upon data They are merely useful 
ways of expressing whatever accuracy exists 

B. Representing the Record of an Individual 
There is no more striking way of representing the test record of an indi- 
vidual pupil than by means of a graph Such a graphical picture of the 
strong and weak points of a single person is called a vrofiU Sometimes the 
term psychograph is used Many publishers of standard tests provide blank 
forms for showing these profiles Usually they appear on the first page 
of the test, where they can easily be detached for filing 
Profiles of a single subject. Figure 15 shows the profile of a sixth-grade 
pupil on the Iowa Silent Reading Tests, New Edition The broken-line 
profile lor the class, based on medians, is also showm With a median stand- 
ard score of 150, corresponding to a percentile rank of 49 for the eighth 
month of the sixth grade, John is an average reader The median score 
of his class IS 151, which has a percentile rank of 52 John scored highest 
on Tests 4 (Paragraph Comprehension) and 6A (Alphabetizing), poorest on 
Test 6B (Use of Index) He exceeded the class average considerably on 
Tests IR (Rate) and 6A (Alphabetizing) 

Profiles for a series of subjects. Profiles are especially useful in repre- 
senting a pupil’s record on two or more subjects Most test battenes provide 
a convement form for such a profile Figure 16 shows the profile for a 
tenth grade pupil on the California Achievement Tests, Advanced Battery 
This student scored highest on syntax (Grade Placement = 15) and low- 
est on spelling (GP = 6 7) His best area is Test 5, Mechames of English 


» Sister Clara Francis Bamberger, ‘ Interpretation of Graphs at the Elementary School 
Level Educational Research Monographs, 13 1-62, May 1, 1942 




and Grammar (GP » 12 5) On the complete batterj he has a dead-center- 
av erage percentde rank of 50 and a grade placement of 10 6 
After a period of remedial instruction, thepurpose of v. hich is to strengthen 
the weak points of individual pupils, it is a good practice to give a second 
form of the same test A second profile draTvn m a different color upon the 
same sheet is one of the best ways of revealing the progress made, if changes 
are interpreted cautiously 




4 MATHEMATICS 3 MATHEMATICS 
rUNOAMENTAtS HEASONING 


SAMPLE profile— COMPLETE BATTERy 

A Test GIten in Jonuory to o lOtli Grade Student. Age, 183 Months. Mental Age, 152 Month! 

DIAGNOSTIC PROFILE ««.». sc... h,.,> 

f f f t U' iV ii) ‘loViii ‘i 2 !i) nlo is 

*A. Matkemslio . . • • 22 I A . ....... * * * 7 V — V 

13« B 23_l_ »— « I '» Y 7 7 V 7 » 

§2, CSxwISM«e 22^ J — » f ; »_ » 

So D General iLz;^ t — * ? ! f f '* '/ V V '/ ** 

TOTAL lA+t+C+Bi 90 ® \lO‘^SO\ « m » m m ea^ « _ss ti 


[total itir+Gt 55 Ml V — V ”” 

TOTAL READIHG US ISl 30 « 5»' SJ ' 7a^xll M TO »i l!a 


A. Number Concept • - 20 M— ....... s * * * _ * 

B SymbeU oni IsTei . IS ^ ....... r > 4 n t 

C Numbers & Eqaefroni 10 , ....... 1 e t 

JOTAL «A+B+C+0t 60IS] «»_n n » 

“t Adinwn 20 ih. 0 » 14 It ir n 11 

E Subtroetioa 20 iA— ... ... r « « »« ti it » 

C Mottiplitalien • • • 20 ....... 

H Dnrtion ^ . . .... 

TOTAL iE+F+G»^H' EO U^oi^Ol » k n « 
TOTAL MATH. HO (^^^5(71 »ea SO to 70 7S 1 

f A Capctglizariuii . . . . Is /A \ ,« , \ •„ » , 

B Puncfuotion ..... 10 ^ ...... * > i 4 


*" 2 I C Wflrdi end Sentence* 25 Szl—. 


. 90 qS 100 IDS 110 tIS 120 

I ■ ■ ' V 1" f t* 

I I I I I 


D Psrtt cl Speech ... 17 
L Srnlan 13 


..nlO_. 


" LtOTAI.a+E+CI-D+EiBO ® ^ » w IS M » M H ” 

£tOTAL spelling 30 [Z] {^\S 1 W J J ,J ,.l „l „ l„ (■ ,.l 


TOTALUNGUAGEIIoE] HS] kaiM.s'sn sj 60 Is Jr.7i^3 S5 SjUsli 


TOTAL TEST 395 ® imjSH K_ 1;S150 175 EM 375 300 335 is 

I I ‘ ■ I ' I iQJ I ' 1 I IT 

J fi.’i”,®,'’, ”■<> «■« nc «ii « 

'ill .'■■■I 

1 4 IWTELU <y P 10.3 

I ' ACTUAL. G.P. lO.** 

' ^CMRON. G.P. 9.9 

Figure 16. TOe Piffle of a Tenth.Grade Pupil on the OaUomia Achievement Ted 
(Reproduced liy peruua.iuu otthe Cahlorma Test Bureau ) 

25^ 




Fijjure 17. Profiles for a Student Tested in the lifth and Sixth Grades ^Reproduced 
periiii«sioo of Educational Te«t Bureau ) 


Figure 17 is a “^lOrmal Progress Chart” issued by the Educational Teat 
Bureau for use uith their Coordinated Scales of Attainment It shotrs tno 
profiles for Mar>' L one tsolid line) based upon her perform- 

ance on Battery 5, Form A, on January 18 in the fifth grade and the other 
{broken line) for Battery 6j Form B, administered at the same point in the 
sixtli grade The grade equit'alents of her scores vaned from 5 2 for science 
in the fifth grade to 7 9 for English and literature m the sixth grade U&t} ’s 







258 


THE TESTING PROGRAM 


overall percentile ranks m the fifth and sivth grades are 55 and CO, respec- 
tively, nhile the percentile rank equivalent of her Kuhlmann-rinch IQ 
of 111 IS 75 Thus, even though she is slightly above the average of the 
class in achievement, Mary seems to be working somewhat below her 
potential ability to attain 

C. Representing a Frequenej Dislrilmtion 

Tlie ordinary frequency distribution does not giv e a vcr>' clear picture 
of the situation There are three common methods of representing a dis- 
tribution of scores graphically the histogram or column diagram^ the /re- 
quency 'polygon, and the smooth mine 

The histogram or column diagram. The histogram is a senes of col- 
umns, each of which has as its base one class interval and as its height the 
number of cases, or frequenej, in that class Figure 18 represents a histo- 



PERCENTAGES ASSIGNED 

Figure 18 A Htstograni, or Column Oiagrani, thft Pweeutage ValufiS 

Assigned to an ArilhmcUc Paper bj Fortj-Two Scorers 

gram shownng the distribution of percentage values assigned to an anth- 
metic paper by forty-two scorers As the greatest frequency is 9, in the 
59 0-&4 5 cla®s, it is not necessarj to extend the v ertical or frequency scale 
at the left above 9 As the scores range from the 29 5-34 5 class to the 
74 5— 79 5 class, it is necessary to represent the honzontal scale only through 
that distance It is customary, however, to extend the scale one class in- 
terval above and below that range In order to avoid having the figure too 
flat or too steep, it is usually Wrell to arrange the scales so that the width 
of the figure is about one and two thirds times its height— that is, the ratio 
of height to width should be approximately 3 5 In actual practice it is 
customarj to represent the histogram m outline form, rather than to show 
the full length of the columns Figure 19 dlustrates the shaded outline form 
of the histogram 



GnAPUlCAL representation of data 239 



rt^uro 19, A Histogram, or CoJumn Diigram, fioproscnting the Dutributwo ot the 
S3 IQ s m a bioalJ Juoior High School 

Tlic frcqucnc) poljgon. The process of coiistnicting the /rejuettcii 
■polygon is very mucJi Jikc that of constructing the histogram In the histo- 
gram, the top of each column i-* indicated by a horizontal line the length 
of one class interval, placed at the proper height lo represent the frequency 
at that class But in the polygon a point is located above the mxd-potnt of 
each class interval and at the proper height to represent the frequency 



260 


THE TESTING PROGRAM 


at that class The«e points are then joined bj straight lines As the fre- 
quencj is zero at the cla«!ses abo\e and below tho<=e m the distnbution, the 
poljgon IS completed bj connecting the points that represent the highest 
and lowest clashes with the base line at the mid-points of the class mter\ als 
next abo\e and below Figure 20 shows a poljgon for the same data repre- 
sented bj a histogram m Figure 18 

Tlie smooth curve. Sometimes a smooth cune is drawn instead of the 
histogram or frequencj polj gon The onlj difference is that for the former 
a smooth cun e is drawn through the points, and for the latter tw o figures 
a jagged hne is u'sed The most common use m educational measurement 
cf a miooth cune is m the so-called normal eune ligure 21 shows such a 
cun e superimposed upon a histogram representing the ai tual distnbution 
of mnth-grade pupils on eleven intelligence tests 



Art?, An Actual Cune Compared mih the Theoretical Curve of Piobabihtr 
^ “ tatogram based upon curves for eleven well Inovm group mtelh 

Bureau of 


ThereisoMsmoothcune, however, whichtswidelj used m representing 

test scorra This is the percenule cune, or otpte Figure 22 shows a percentde 
et^e ^ed to represent the percentage data alreadj emplojed to dlustrate 
the hnrtog^ and the poljgon The points that determine the percentde 
c^e are located on the honzonlal line at the upper limit of eLh class, 
at the portion that mdicates on the horizontal scale the percentage of 
scores up to ™d mclntog that class It vnU be noted, also, that two col- 
umns have ten added to the ordinary frequency table The cumulative 
freqnoncj column mdieates the number of scores up to and mclndmg each 

in he -10-39 class, making a cumulame freqnencj of 3 m the two 



OH.iPinCAL representation or DATA 



Jipiirc 22. A PiTcontile Curve Rcprc«tntinj; the I’orccntaRe Values Assigned to an 
Arithmetic Paper b> Foft>“T\vo Scorers 


cla.'fscs. The ciimufativc por cent column shot\s uiiat percentage each of 
these cumtilatlvc frequenefes is of the total. In the liJuslration the total, 
is 42. Tlfo first entry in this column is, of course, 100, the second is 9S, 
because *11 is OS per cent of 42; the third is 05, because 40 is 05 per cent 
of 42; and so on for (he others Each value in the cumuiatue per cent col- 
umn is represented as a point on the upper limit of tJiut cla^s interval (the 



figure 2? A Peicmllle Curve liepTMenting the astnbutlon 
Junior H,gh Srhool (see Ogure 19) Tf.e Vr. 1 . 1 ^ of Q., Medan (Qr , and Q. Bead 
the Curve Are Shoivn rvith the Comnuterl Values 'm Parentheses) 



262 


THE TESTING PROGRAM 




Figure 24 Negative and Positive Skewness 


horizontal Une separating that class from the class abo\e it), since it in- 
cludes the percentage of scores up through that class These points deter- 
mine the cune As a rule, especially m small groups ^here irregulanties 
are most likel> to occur, it is best to miss some of the points m order to 
obtain a smooth and regular cur\e, but care should be exercised in order 
to lea\e about as many points on one side of the bne as on the other 
Figure 23 shows another ogiie Such a smoothed cur\e, although it does 
not exactlj represent the actual sampling, probably indicates verj closely 
what IS to be expected “m the long run " 

Symmetncal and stored cur»cs. Regardless of whether a distribu- 
tion IS represented as a histogram, a polj gon, or a smooth cun e, the cun e 
will be either symmetncal in shape, or else pushed or pulled to the nght 
or left A symmetncal cun- e is balanced in the center and slopes regularly 
m both directions One that is pushed or pulled in one direction is said to be 
sfeiced If the peak of the curve is toward the upper end of the scale, mth 
the longest slope downward toward the lower end of the scale, the curve is 
negatii el} skewed (skewed to the left) On the other hand, if the peak of 
the curve is toward the lower end of the scale, with the longest slope 
toward the higher end of the scale, the curve is positively skewed, or 
skewed to the nght Both kinds of curves are shomi m Figure 24 Many 


Il5-3k9 

115-119 

iio-ii4 
105-109 
100-104, 
95- 9? 

80- 64 
75- 79 


XX 

XX 

xxxxx 

xxxxxxxx 

xxxxx 

xxxxxxxxxxxxxxxx 

xzxxxxxxxxxx 

xxxxxxxxxx 

xxxxxxxx 

xxxxxx 

xxxxxxxx 

xxxxxx 


Figure 25 Bar Graph Made on the Tjpe»T.ler, Shonmg the Dutnbution of 91 IQ a 
m a Junior High School 



GK mUCAL REPRrsrNTATlON OF DATA 203 

some scJecI,; e factors arc opctaLg 



F’isurc 26 Bar Graph Jfade on the TypewrUer Showing the Percentage of Pupils 
of Each Age Group ^Tio \\ ere Graduated from High School and the Percentage RTio 
Entered High School hut Did Not Gradiale 


\1 Inch graph is besl’ As is to bo expected, no one type of graph is 
equally good for all purposes The histogram is the easiest of all to under 
stand and is usually best if but one distribution is being represented If two 
or more distributions are to be compared however polygons are usually 
better, since so many lines coincide when histograms are supenmposed 
that the picture is likely to be confusmg The percentile curve has many 
advantages not possessed by other curves The first of these is that it is 
possible to estimate mth a high degree of accuracy the quartiles medians 
and other similar points This means that one can read directly from the 
curve percentile measures like those illustrated m Figure 23 As will be 
slioiin m the next section by means of percentile curxes sexeral groups 



2G4 


the testing program 


can be presented, for conNcnient comparison, on a single sheet The pnn- 
npal value of bar graphs, circle graphs, and picture graphs lies probably 
m school publicity and in the motivation of learning “A successful graph,” 
as Scates points out, “depends far more on careful thought and judgment 
than on techniques 

D. Representing or More Distributions 

There are many occasions l^hen it is desirable to compare two or more 
distributions For example, school administrators may \nsh to compare the 
intelligence oi achievement of the pupils in \anous classrooms or buildings 
The overlapping among the various grades vnthin a single building is a 
striking way to present the need for individualized instruction and varied 
materials 

Representing entire distributions. When it is important to compare 
two or more entire distributions, as would be the case in a study of the 
status of a school or school svstem, the choice wall usually he between the 
frequency polygon and the percentile curve The difficulty of superimposing 
two or more histograms has already been pointed out A scries of polygons 
may be drawn on the same sheet one above the otlicr, or alongside each 









aHM'rilCAL nnPRESENT iTION OF DATA 


205 


scores on 



three grades in reading ability But ei’en mth only three distnbiifions the 
fines cross and recross so many times ns to make any accurate comparison 
of one grade nith another somenbat difficult More than three classes can 
hardlj be represented in the same graph by frequency polygons ^vithout 
considerable confusion It is also difficult to compare distributions accu- 
rately uhere the numbers of cases \aiy greatly, unless each frequency is 
represented as a per cent of its total 
TJic use of percentile curies. For the grapbio comparison of tno or 
more distnbutions the percentile curve has certain outstanding advantages 
Since the frequencies are reduced to per cents, it is readily possible to 
compare groups of unequal size Another important advantage is that sev- 
eral distributions can be represented in a single graph without difficulty 
or confusion Figure 20 shows the distnbution of reading comprehension 
scores for the same grades as in Figure 28 in the form of a percentile curve 
From these percentile curves several relationships are observable that 
were not apparent in the polygons It is quite clear that although the 
seventh and eighth grades have almost exactly the same average scores, 
the eighth grade has greater vanabdity This is evident from the fact that 



266 


the testing program 



Figure 29. Total Comprehension Scores on the Towa Silent Heading Testa for the 
Seventh, Eighth, and Ninth Grades 

the upper half of the eighth grade exceeds the upper half of the seventh 
gradc» but that the lower half of the eighth grade falls behind the lovser 
half of the set enth 

Furthermore, although the ninth grade runs rather consistently above 
the other tvv o grades, about I*) percent of the ninth grade pupils fall below 
the median of the seventh and eighth grades 



Oiunuc IL rlphl’scntaiion or data 26 - 

central tendoncos or *’« ""ly the 

f«.. ng,.m 30 ehons n SnetcU.™^ L’’™T 'r"? " "" 

"tent It .hons three sronps oL n L no it 1 “ 

,nn,.l inonledtte of pro^^;, t 

It " he noterl that after thosooond trrnl the 

11.13 Clear l"“«'e<lge po^cssed A simple line grap^males 



I igitrt 31. Correct and Imrrert Ixwation of tlie ^^o-m3 in a Line Chart SJioning 
Median Scon.!, on a Rcaduig fesf 


Another common u&e of the hne graph >s for comparing two oi more 
schools through several grades, or one school mth the norms on a test 
Figure 31 shows the correct and the incorrect eonstmction of such a graph 
The solid line connects the median scores on a reading test for grades four 
to nine, inclusive The tests ivere given in October, or one-lenth of the ivav 
through the grade The dash line connects the norms incorrectly drawn 
from norms m the manual for the end oj the grade The dot dash line con- 
nects the norms at the proper grade location It mil be noted that when the 
line IS incorrectly located only the seventh grade appears to exceed the 
norm, whereas m reality every grade does Here the honzontal axis is con- 
sidered a scale and the points determining the lines are located mtb refer- 
ence to it 




Grade equivalent values abov^ 10 0 are extrapolated values and not to be interpreted ns aignifying the typical 
performance of pupils of the indicated grade placement. (See Directions for Administering.) 




GRAPHIC iL Rl PRLSENTATION OF DATA 


269 


Iigurc 32 shous the profiles for the seventh, eighth and ninth grades 
of a certain junior Ingh school made bj connecting the median scores on 
each part of the Stanford AthicvementTe^t This figure shows clearly that 
the school is weak in spelling anthmctic computation and study skills 
and particular!) strong in the social studies It is evident that this school 
IS stressing the content subjects at the expense of some of the more formal 
tool ‘nihjccts ^^^lcthe^ or not this appears to be a desirable emphasis de 
pends upon one’s philosophj of education 

Representing the central tendencies and variabilities of a senes 
of disirilintioiis The v anahilitics as well as the central tuidencics of 
a senes of dHtnbutions ma) be shown m a similar manner hj line graphs 
Figure 33 IS an illustration This figure shows <?j the median and C>j for 



.h grade fro. four to n.ne aTrh': 

le three lint « have the same gener P possible to include 

:venth grade rvhere the vanab.I.ty .a ^ J 
om the table of norms the “"“XSe fi^re too eompheated for easy 

rpical school but to do so would make the gu 

iterpretation 



270 


the testing program 



Figure 31 The Central Tendency and Variability m Educational Age of Grades 2B 
to 9A, Inclusive, m a Small City School System 


Figure 34 is a bar graph uhicb shows the central tendency and \ariabil- 
ity m educational age of grades 2B to 9A, inclusu e, m a small city school 
system * In each grade the vertical line indicates the total range, the ver- 
tical bar indicates the range of the middle 50 per cent, and the middle of 
the bar is the approximate position of the median The honzontal lines 
across the full width of the graph mdicate the norms for the beginning of 

* Report of the Public Schools of ShetbjfoiUe, Kentucky, page 73 Bulletin of the Bureau 
of School Service, Vol I, No 1 Lexington Univeraitj of Kentucky, 1928 




GRAPHICAL RLPRESLLTATION OF DATA 


271 


each grade It \mII be noted that tlie part of each bar v,hjchis crosshatched 
indicates the proportion that is abo\c the grade norm T\hile the shaded 
part IS the proportion that is below the grade norm The o^erlappmg is 
c^peciall> marhed from 7B to 9B This condition suggests the advisability 
of trjnng to find out whether these ninth grade classes happened to be 
w cakcr than usual, or w hethcr the teaching emphasis w as responsible for the 
apparent lack of improvement Tins tjT>c of graph is an effective means 
for prc'cntiiig tlic c^'cntial features of a total situation Here the amount 
<)fo\Lrhpp,ngisimprcs5»e Itmllbenotcd fore\ample that those hose 
IliV IS 12-0 are lound in all gndes from SB to 9A, and that pupils classified 
in S\ tar> in E-l. from just ahotc the 4A letcl to almost the lOA level 


General SliRseslions for Conslmeling Grnphs 
Varied pmelire. A tride diversity of practice vvull be fomjd in the 
eonstmction of graphs as used in psjchology and education ‘>tle 

mcmls placed at the beginning initial letters of all unportanl 

in capital letters, as „ord in the title is capitalized 
words arc capitals, and agm , > " ^ for capitahza 

unless there arc proper perhaps most common 

tion applj The second of I eooZttee composed of representa 

Suggested standards. Yearn ac^^ ^ 

tivcs of the vanous , for constructing graphs This r^ 

report- . A points required for the proper representation 


ucationai oaiu a**- - . , , « n^ht 

-1 Tlie zero lines of the ... desirable to 


,es .tn usually desirable to 

other Ime used as a basis 


,e . Iher c^jrfmate fines representing percentages 

6 For ourveshavnnga ^^jPPp,,reentlmeor 

emphasize in some distinctiv 

of comparison Committee oa Standard, of 

i.W C Ermton Chaamaa 

Graphic Representation 0««rferly 

14 790-797 1915 



272 


THE TESTING PROGRAM 


7 MTien the «cale of a diagram refers to dates, and the Dcnod represented w 
not a complete unit, it is better not to emphanze the first and last ordinates, 
•once «uch a diagram does not represent the beginning or end of limt 

8 ^Tien cur\ es are draim on loganthmit coordinates the limiting Imcs of the 
diagram «hould each be at «ome power of ten on the loganthmit, scale? 

9 It IS ad\'isable not to «how anj more coordinate lines than necessarj to guide 
the c\e m reading the diagram 

10 The cur\elmesofa diagram «houId be *harpU distinguished from the ruling 

11 In cur\es representing a «enes of obfer\’ations, it is adxTsable whene\cr 
possible to indicate clearh on the diagram all the cur^ es representing the ^parate 

ations 

12 The horizontal scale for curves should usuall> read from left to nght and 
the V ertical ®cale from bottom to top 

13 Fisures for the scales of a diagram should !>e placed at the left and at the 
Ixittom or along the respective axes 

14 It 13 often desirable to mclude m the diagram the numencal data or formulae 
represented 

15 If numencal data are not mcludod in the diagram, it is desirable to give 
the data in tabular form accompanving tb( diagram 

16 AJ! lettenng and all figures on a diagram should l>e placed «o as to be easilv 
read from the base as the bottom, or from the right hand wlge of the dugram as 
the bottom 

17 The title of a diagram should be iruule as clear and complete as posable 
Sul>title« or de^cnplions «hould be addc*l if necessarj to insure clearness 

A useful manual uluch treats of tht di/Terent phases of the construction 
of bne charts has been prenared by the Committee on Standards for 
Graphic Pre entation “ For a fuller discussion of the general problem of 
graphical repre entation, several excellent books listed at the end of this 
chapter are available 

The sugge'dions giv eu bj Spear should be kept constantlj in mmd when 
constructing graphs ” 


In the P^nt ikt, when ^-lsn!^! education m aU ajpects has become, not onli 
. .t . “ '?*■’' Icnnung our attention is called more than ever 

“ P<»5>bilitie3 in this field The eje absorbs antten 

>>11101 recenethe message hidden behind wntlen 

simnh Its ^ reteals that message bticfij and 

■nnph Its purposes which follow, are clear from its contest 

2 ‘’’V’ “ I*'®‘'>><! ''lib tectiial matter alone 

3 A’S"„fa^X “ 

dep ' am “"1 "'«'■» Hie following six 



273 


GHAPIilCAL REPRESENTATION OF DATA 

1 Dclcnninc tlie significant message in the data 

2 lie familiar nith all tjpes of charts and make the correct selection 

3 Meet the audience on its omi Ic\cl,knon and u'e all appropnate Msual aids 
4. Gne detailed and intelligible instructions to the drafting room 

5 Knov\ the equipment and skills of the drafting room 
G Recognize clTectuc results 

E\en uhen no technical assistance is a\ailable, teachers and administra- 
tors can make excellent u^e of graphs to facilitate the attainment of educa- 
tional objccti\es 

Sllictld Rlfluences foii J-unruER Revdinq 

Arkin, Ilulicrt, and Colton, Ravmond R, Graphs Row to Make and Use Them 
Xcw York Harper A. Rrothers, 1936 224 pages 
linnton, Willard Cojic, Graphxc Preaentahon New York Bnnton Associates, 1938 
.■>12 page** 

Ktllej, Truman L. Fundamentals of Statislica Cambridge, Mass Harvard Uiu 
\crvi^ Prc«s, 1947 Chapter IV, “Graphic Methods " 

Modlej, Rudolph, How io Ise Pulonal Statiatica New York Harper A. Brothers, 
1937* ITOpigcs 

“Presentation Problems," a feature in the Ammcan SialtsUcxan since August 1947 
S|)cnr, Marj Eleanor, CAorlin? Slotwfics New York McGraw-Hill Book Companj, 
1032 233 pages 

Thompson, Lonng M , “Meaning m Space," ETC A Renew of General Semantics, 

8 103-201, Spring, 1951 

Vernon M D , “The U«e and Value of Graplucal Methods of Presenting Quantita- 
tive Data," Occupational Psychology, London, 26 22-34, Januarj, 1952 



10 

The Uses and Limitations of Norms 


It IS «elf-e\idcnt that the \alue of test scores mil be dependent largel> 
upon hon well thej are understood The preceding chapter concerned the 
summarization of «-cores bj graphical methods as an aid to their interpre- 
tation The present chapter mil consider some closely related problems of 
mterpretmg «corcs lij the aid of norms 

A >orms and Standard'* 


Standardi7cd versus nonslatidardiztd tests. At the outset it is im- 
portant to distingui'sh clearU between a norm and a standard,^ especially 
because the terms are frequcnllj used interchangeably The confusion 
doubtless an«es o\er the fact that norms are used with standard tests and 
that a part of the process of standardization is the doncatioii of norms 
Many standard tests began as informal objective tests made bj class- 
room teachers When an informal test has gone through the process of 
standardization, it theij differs from the original class test in four essential 
aspects In the first place the content has been standardized This means 
that each item has sur\'i\ed most careful scrutiny bj a competent person, 
or more likely a group and that its difficulty and laluc ha\e been detei 
mined by ngid expenraental processes that ha\e eliminated its weaker 
feUows In the second place its method of administration has been stand- 
ardized This means that definite directions ha\ e been w orked out, usually 
with appropnate time limits and the like In the third place, the method 
been standardized Ihis means that scormg kejs have been 


‘John C Hanagan^p^izw distinction on page 69S and elsewhere m Units 
Chapter 17 in E F Lmdqu st (Editor) Educalumal ^feasnremeni. 
W ssl ington D C American Council on Education 19ol 


274 




HIE USES AVI) UillTAnONS OP NORMS 273 

prcpiml i,n,l that dcnnita rults halt been fomiulated for mark.ng the 

1 molh t’l ‘‘'■‘"'"'"'"S *'■« “«'•« <"■ «»<■!' part and on the nhnie test 
1 mill. , the proccK of inlcrprclal.on his been stindirdized at le.st m 
part Im me ms that tables of norms arc non mailable for internretnia 
i.o lanous stores made on the test Tlicse norms ire merelj scores nhich 
hi\ 0 been made bj hrRO mimbersot pupils distributed oi er Hide geoeraph 
1C il nrcis and rejircseiUmg innous tipes of schools, and which have been 
Kroujjcd, as a nile, according lo chronological age or school grade 
jSoriMs >er8»s fitan<lor«k. The uord sWord implies a goal or objecliie 
lo be reoehed It should be clear, then, (hat a norwt is not a measure of what 
ought (0 be, a goal, but is more!> a measure of uhai ts, the status quo hen 
a gnulc or class is up to the national median on tlie test, it is just an average 
or t\ pical group Of i our^e, it ma> be that this score represents a reasonable 
performance for the group under the circumstances, but that fact would 
ha\c to lie determined further inquirj The mere fact that the grade 
attains the norm does not of itscU establish an} thing other than that the 
performance is that of a topical group Manifestly a group of students 
having superior opportunities and capacities ought to make better than a 
tjTiical record On the (ontrary, a group of low ability and opportumty 
miglit find It \irtuilly impossible to do that well Unfortunately, at the 
present time not many tests have more than one set of norms for each 
grade or age group, all ty pcs of pupils and schools being lumped together * 
^\^lat IS needed is a norm for at least each major type of school organization 
and type of pupil Even then such norms could hardly be regarded as 
reasonable standards of attainment For one thing, the norms of achieve 
ment tests are nev cr more than tentative They must be continually chang 


ing with increases in length of school term and with improvement m train 
ing of teachers, in textbooks, in school equipment, and the like It is also 
not unreasonable to assume, human nature being what it is that average 
achievement with the facilities now available could be considerably better 
than exists at the present time In a real sense the only valid norm for the 
individual pupil is his own past record, and the only valid standard is his 
maximum capacity for growth 

Reasonable standards or goals of attainment, are almost altogether 
lacking It IS conceivable that such standards might be worked out and 
eipresicd m numencal units on existing tests, or on others to be devised 
But sneb a process is inherently difficult, nhereos the process of building 
norms is time-consuming and laborious but perfectly simple and straight 
foniard In fact, an adequate technique for establishing standards has yet 
to be M orbed out Ideally, a standard would have to be provided for each 
individual At any rate, no one standard could be estabbshed ulueh would 


25 669 October 1044 



270 


THE TESTING PKOGR 1 M 


i)e equalh appropnate for e\ crj bodj , or c\cn for anj considerable number 
In \ieu of such considerations as thc«c, Wood has said * 

As currentlj used, the uord alandard no place in educational literature 
out'ide the perorations of con\ention orators 

Speaking more con^tructu elj , it is sufficient to point out that educational 
<5tanchrds are nere^sanU mdmdual and m their fundamental nature are akin to 
the standards of tailors and shoemakers who judge the quahtj of their products 
bj how well thej fit the mdivadual for whom thev arc intended and how 
long the\ «er\e him 

Swan has satmzed the idea of a single uniform standard by imagining 
what would happen if all the tailors of the country got together and agreed 
upon a “standard suit The distressing outcome is described* as follows 

In«tead of the old haphazard procedure, the standard suit was brought out when 
a man went into a tailor shop to get a new suit If he did not fit the suit, he was 
rejected then and there He was thus sentenced to join a nudist colonj Men soon 
learned that the onlj thing to do was to eat the right food and take the proper 
exercise to make them just fit the suit If he perchance ate somethmg else 
than that required to make him fit the standaid «uit, he would be rejected, even 
though what he ate was better for him from the standpoint of health than that 
needed to get readj for the standard 

The important thing to remember is that, for the present at least, such 
standards do not exist in any subject Certainly an understanding of the 
way norms are determined would make it obvious that they lay no claims 
to being goals of performance, unless perchance one is wilhng to accept 
mediocntj as a goal 


B Raw Scores and Derived Scores 


^Tiat a score means. To take a simple case, let us suppose that a 
certain pupil has made a score of 40 on a spelling test of 50 words What 
does this score of 40 mean? To say that the score represents an achiev ement 
of 80 per cent is true as far as it goes, but this obvious interpretation leav es 
m^h to be desired As the problem of interpreting a giv en score in mean- 
ingful terms is fundamental m all measurement, it desen es careful con 
sideration 


A score on any test is simply a nwmertcal description of an individual's 
performariK on that test A distinction must be made between test perform- 
ance on the one hand and ability and capacity on the other hand Per 
foTOance is merely evndence of ability or capacity Ability refers to an 
i ndivndual s actual achiaemenl at the present tune, whereas capacity refers 


Basic Considerations of Educational Research 3 13 Feb- 

teaching' a.d 

* Ibi I page 27o 



277 


Till USES AAD Lni/TAT/Oys OF WliMS 

tohm volet, aUhcs Smcc a test «al,,ajs „ sampling rather than a complete 
mcisiircmcnt a pupil s response to the test situation is accepted as an ex 
pracsion of Ins ahilit^ operating under a given set of conditions But a poor 
score on a V ahd ac .lev ement test is not neeessanlj endence of poor abdity 
in Int subject under anj and all conditions It maj be due to anj number 
ol factors such as physical Illness or discomfort poor eyesight or heanng 
emotional disturbance or dislike /or the teacher or subject 
In like manner a poor pcrfonnance on even the best group test of intel 
jigoncc n\ ailnblc is not necessarily positive proof of a lack of nhafc ne call 
general intelligence It ma^ be due to anj one factor or combination of 
factors mentioned above as operating m the case of achievement tests 
In addition there are several other foctors that ma> be responsible such 
as poor reading abilitj, inabihtj to understand the English language and 
cspccialli inadequate learning opportunities m school and outside For 
example ^\ heeler* found that the average intelligence of Tennessee moun 
tain children as measured bj two well known group tests was approxi 
matclv normal at six }cars but that it showed a fairly consistent decrease 
with increases in chronological ages The data warrant the significant con 
elusion 


The general trend of this mvestigation indicates that the results of both test** 
are niatcn*iU> affected b> environmental factors and that the mountain children 
are not as far liclow the normal as the test« seem to ind cate "Uith the proper 
environmental changes the mountain children might test near a normal group 


Ten jears later iniecler^ repeated the study in the same region which 
had shown ‘ definite improvement in the economic social and educational 
status during the intcrv enmg period Although there was still a tendency 
for intelligence as measured by the tests to decline in the upper >ean> the 
average IQ was ten points higher than it had been a decade earlier 
A study of Kentucky mountain children by Asher* revealed similar re- 
sults and led to the conclusion that a valid companson of the intelligence 
of urban children and of children in less favorable environments awaits 
more adequate measunng methods ’ 

A group of researchers at the University of Chicago found that many 
mtelhgi-nce test items were answered correctly much more frequently bj 
children from the higher socio-economic levels than by those from the tower 
ones and they concluded that the testa wrere penalizing the latter young 
sters for their lack of contact with middlc-ciass culture and lack of appre- 


<L E Wheeler The Inlrfl geiKB ot East Tennessee Mountain Cluldren Journal 
Mountain Ch Idren Journal oj Genetic Feyehohm 46 480-486 June 



278 


TUI TESTIhG PROGRW 


ciation for middle class behaMor* This interpretation has stirred up con 
siderable discussion, many psj chologists do not accept the Chicago group’s 
conclusion that the items should be rewntten to be “fairer” to louer class 
persons 

The point is that capacity is alua>s inferred from activity or perform 
ance The inference, for example that two identical scores on an intelh 
gence test reallj mean equal degrees of intelligence cannot be safelj made 
unless it IS knovvn that the learning opportunities have been at least ap- 
proximately equal A full realization of this fact would enjoin more caution 
than IS often shown m the interpretation of scores on so-called tests of 
genera! intelligence Trained examiners exercise care m observing ngidlj 
controlled conditions for administering the tests and objective standards 
for scoring the papers but it is often hard to be sure about the pupil's past 
history, which maj be reflected to some extent in his present performance 

Raw scores versus derived scores When a test paper has been marked 
according to instructions, the score obtained is called a raw score or crude 
score On tests as distinguished from quality scales it is often called a 
point score since the numerical description is in terms of points On a scale 
as for example the Ajres handwriting scale, the numerical description is 
hardly in terms of points but rather in terms of some arbitrarj value as- 
signed to a rank or position In the example given above, the pupil has a 
point score of 40 on the spelling test In other words 40 describes his per 
formance on that particular test at the time it was administered 

But a raw or point score bj itself means vcr> little It is usually not pos 
Bible to compare a raw score on one test directlj vv^th a raw score on another 
test The difficulty is that the units are not comparable The problem is 
much hke that imposed when adding* | ’ and f It is first necessarj to 
nnd a common denominator, m this case 12 and then to express all values 
m terms of that denominator The problem is then simple 


A + A + A + IS = H = 2,«V = 2J 


Kenneth Eella and other- 
19?1 IntXp,^T,^“h ^ 38S pages Chicago University of Chicago Pre- 

WctL Influences upon 
p5iTl948 100 pages Cambndge Mass Harvard Un.vei4.tj 

InrelTgenSTnd^C^^^^^^^^ G Barley Reven 

Apnl 1902 Sa^enK h Applied Psychology 36 141143 

of Apphed PsychoU,gy 36 42^^ SSir”?fe2^'and if 

Intdhgenceand CulLal DiSerencef 4^ TS 

toCmoI Conjermce on Teolmj P,M^ pS^n N T “ ’I “« 



279 


iin: vsrs A^D umiiaiions or A’o/n/s 

To meet a 5„„.hr need, teat makers I>«e found it necessary to determine 

“'■<= “"ed “derived 

cores /I denied score is a mmcncal descnplwn of a pupil's performance 
111 terms of „ 7 he norm itself is (he performance of a defined group 
considered to bo Ivpical For ciiample a pupil who answers correct^ 22 
qtics 10,3 on (he Thorndike-MeCall Reading Test has a reading ability 
which IS (hat of (he normal, or average, tvvelve-jcar old child at the end 
of the fifth gnde 

Usualh, with standard tests the norms used are either age norms or 
grade norms Tlie dcri\cd scores mcrelj de‘5cnbe the mdiudual’s position 
in some group Sometimes the age norms arc earned one step further and 
expressed in terms of quotients, that is, one age score is di\ ided by another 
Witli the exception of quotients, most dcri\ed scores are obtained from 


tables of norms m the test manuals ^\hlch gi\e in parallel columns the 
derived scores equivalent to various point scores As the problems of inter 
prctation difter ‘somewhat for achievement and intelligence tests, they will 
be treated scparatelj in tlic next two sections 


C. The U<ic of INontis in Interpreting Scores on 
Intelligence Tests 


Mental age versus intelligence quotient Ihe most commonly used 
units in wluch to express the results of an intelligence test are mental age 
and intelligence quotient, usually abbreviated MA and IQ “ It is im 
portant to understand the distinction between them MA is a measure of 
mental maturity and “indicates the level of development which a child has 
reached at a given time," to use the words of Terman This degree of 
mental matuntj or level of development is expressed in terms of that 
“possessed by the aierage child of corresponding chronological age ’ For 
example, a point score of 75 on the Terman Group Test of Mental Ability 
IS equivalent to an AfA of 13 years and 2 months usually written 13 2 
Ihis means that when the Terman Test had been given to hundreds of 
children m various parts of the country it was found that the average 
score of a child with a chronological age (CA) of 13 years and 2 months 
w as 75 points Any child w ho makes a score of 75 on this test is said to have 
an AI A of 13 2 But pupils of venous CA‘s make scores of 75 on the Terman 
lest It IS clear, therefore, that a lO-jear-oJd child mth an MA of 13 2 
has matured rapidly, whereas a 1-l-year old child with an MA of 13 2 has 
matured at a much slower rate In other words MA is a measure of stage 
or level of maturity but not of rale Kate is indicated by the IQ which is 
obtained by dividing the MA by tlie CA and multiplying the resulting 


v« Standard score IQ e which do not m^ve mental ages are 
Intelligence Scales both adult and cbiJd They pos-^s certain distinct advantages over 
»lio trflHifinnal 10 and are popular with individual testers 

f= Lot,rvr i“m”n pvje ? itaj lop 

Mifflin Company 1919 



280 


THE TESTING PROGRAM 


fraction by 100. IQ = 100 In the preceding lUustmtions 

the IQ of the 10-jear-old child -nhosc MA is 13-2 would be* 

The IQ, then, gi\cs us a different interpretation of a score on an intelli- 
gence test from that afforded bj the M\ The IQ ts a measure of rate of 
maturity, whereas the MA is a measure of lad or stage of maturity In both 
cases rate and le\cl are relaU\c to the standardization group If a child has 
matured rapidlj , he is said to be bright, if he has matured slonI> , lie is said 
to be dull A fuller interpretation is to be gi\ cn a little later For the present, 
it is sufficient to note that ordinanlj both the hlA and IQ of a pupil should 
be recorded, if available, for each has its distincti\e "values — and limita- 
tions 

Advantages of the MA concept. The hlA has certain outstanding 
V alues Probablj the chief of these is that it makes possible a comparison 
vnth achievement scores also expressed in age units, as well as with the CA 
of the pupil, so long as the derived scores are obtained for the same popula- 
tion or from comparable populations The age basis of comparison is a 
much more stable unit than the grade location, which is greatlj influenced 
bj the promotion pobcies of the school 

Lumtations of the MA. There are also certain serious limitations of 
the MA, most of which apply particularly to the use of the concept m the 
high school It has often been pointed out that the defimtion of jiA does 
not hold true for CA's beyond 13 or 14 One reason for this is that the norms 
were based primarily upon pupils in school, who became an increasinfdj 
select group, the weaker ones tending to drop out This was especiall> true 
when Tennan was standardmng the onginal (1916) Stanford-Bmet scale, 
against which man> later tests were vahdated Then, too, m spite of their 
appearance, the mental age units on the scale are probably of unequal 
length, the annual mcrements becoming smaller and smaller as thej ap- 
proach matuntj, when the curve flattens out altogether But no way has 
been devused so far for equating these units, or for making satisfactory 
allowance for their vanation in length This is the principal reason why 
true growth curves of mental devdopment are not obtainable up to the 
present even when the same individuals have been measured repeatedly 
ov er a long penod of years This also compheates the problem of inv es- 
tigating the constancy of the IQ, and of its computation in the later chron 
ological ages Neither of these limitations, however, is very senous m the 
elementary school 

There ’s a more senous limitation which appears to operate on all age 



Tllh USES AND IWITATIONS 


or NORMS 


281 


raUe to those on another test It is, of course, entirely possible that 
perftet standardization is largely resiionsible But nhatever fie ezpjana- 
lion, It IS clearly necessary in rcforltng tntelhgence test scores to indicate both 
(he name of (he (csl and the form used 

It maj, of course, be tnic that un<lcr the circumstance'? the terms MA 
and IQ are not particularli fortunate when used to describe the scores on 
existing tests Be that as it may, the users of these tests should franklj 
recognize such limitations as exist It is a curious fact, however, that people 
nre just ns loath to recognize the limits of thtir brnm children as are parents 
to recognize the limits of their flesh and blood cJuldren The difficulty of 
nrnxing at a rational interpretation of a low score on an inteUigenre test 
max ns well he due to m>opii on the part of the interpreter as to that con- 
dition m the parent w ho«c child rcceix cd the score Boynton” recommends 
what ho calls a “pragmatic attitude” toward the tests, for the facts are 
that “in a xasl majority of cases they work xvith q high degree of success ' 

After all, in spite of certain definite fimitatiocLs, intelligence tests, when 
intelligently used, do afford x aluable information to classroom teachers and 
school administrators So long as that is true, whether they measure intel 
hgence or something clic, whether the age score really mental age or only 
personal age, would appear to be pnmanly a matter of academic interest 
only 

One other limitation of MA and all other gross umts is that by lumping 
together many elements they obscure sigmficant differences Two cbldren 
of the same CA might hax e an MA of 8 years and yet be quite unfiLe 
One child might be unusually strong m the linguistic elements of the test 
but lacking m the more concrete, practical, or common sense elements 
w hile just the rex erse might be true of the other child This means that the 
TXillern of the test responses, as well as the total or average, must be con 
sidcrcd This, of course, dues not mean that the total score has no value 
but rather that by itself it is madequatc, especially for diagnosis and guid 
aiice The practical suggestion, then, is to consider the pattern as revealed 
by the proHle, as well as the total score, be that an age score or what not 
As Thurstone sajs ** “Each uidividiial should be described in terms of a 
profile of mental abilities instead of by a single index of intelligence 
The computation of the IQ As onhnanJy wntten, the formula foi 

the IQ 18 100 That i-, the IQ is the quotient obtained b. 

dividing the mental age o£ the pupd by hia chronological age at the time 
the test was given In other wor Is, it is the peroentage that the mental ag 

i.paulL Boynton /ntdl, grace m Mamlrda^ asA Utasurcaanl pagM 231 231 
rimory Alnlitiea Cdmatuma Record 17 133 Sopplemeni ivu 



282 


THE TESTING PROGRAM 


IS of the chronological age As a matter of fact, howe-v er, the CA U'^ed as 
a di\ isor is ne\ er more than the age at hich the test maker assumes mental 
maturity is reached Upon the basis of the e\idence available m 1916, 
Tennan*® suggested that a divisor of 16 3 ears be u=ed for all pupils whose 
C\ IS 16^ or above In the Rev ised Stanford-Bmet, Tcrman and Memll" 
suggest this rule 

Lp to lS-0 the entire CA la counted, bc>ODd 15-0, none of it The CA of a 
subject v\ho la betvieen the ages of 13-0 and 15-0 is counted as 13-0 plus 3 of 
the additional months he has hved This means that a true CA of 14 is counted 
as 13 8, a true CA of It as 14—1, and a true CA of 16 as 15-0, which is the 
highest div isor U'cd m the computation of the IQ** 


This suggestion would appear to be m keeping with the fact that mental 
matuntj is reached graduallj rather than abruptl3 The age at which it is 
attained probably vanes considerabl3 from test to test Man3 wnters 
favor using percentiles or standard scores, rather than IQ’s, especially be- 
3ond the eleinentar3 school “ 

The actual work of computing IQ’s can be greatl3 reduced b3 the use 
of tables such as tho«e in Terman and Merrill A preceding chapter em- 
phasized the need for a careful checking of the sconng and totaling of all 
scores and the obtaining of the ALA or other equivalents from the tables 
of norms m the manuals There is one other step in computing the IQ, ev en 
vnth the lue of tables, that must be watched carcfull3 to insure accuracy 
That IS the determining of the CA of the pupil In the lower grades this 
age score should be taken from the date of birth as shown on the school 
records, which m turn should be based upon a birth certificate A 3oung 
child is hkel3 to put down 9 when he is merely “going on 9,” for example 
In the uppier grades it is usuallj safe to rel} on the pupil’s answ er as giv en 

“LewnsM Terman TU Measurement InUUigertct, pages 140-lil Boston Hough 
ton Mifflin Companj 1916 

**I^wi 3 M Terman and Alaud A- Merrill, Meaxunng InieHtgence page 6 S Boston 
Houghton Mittim Company 1937 

** In symbols the three formulas are 


IQcCAI«ss Uiui 13-0) = 


lOofMAA lOOMA 
\C\y CA 


IQ CA- 13-16) 


MA 


Ll3 -f- i(CA - 13). 


] = 


For examinee* 16 5 ears old or older the formula becomes 


66 +CA 


IQtCA-16-ODrmoro = UM = S 67MA 

The«e formulas appl) only to the PevTsed (ITS?) Stanford Binet InteHigence Scale, 
denominators for other tests mav be quite different 
The Wwhsler Intelli ence Scale for Children (\MSC) has 8 tandard-«core IQ s 
for ^ age leve s from ^ through 15-11 David Wechsler Manual for ihe WeehsUr 
l^lhgenceSca'e for Children pages 27-59 New York The Psychological Corporation, 

*• Lewis M rennan and Alaud A. Memll, op pages 417-450 



283 


TIIU USES AND LIMITATIONS OF NORMS 


on the test blank On most tests he is asked to gne his age at his last 
birthaaN, and then to gi\e the jear, month, and da} of his birthday, or 
else to tcl) lion man} months it has been since his last birthday T he trouble 
usxiall\ tomes v.\th computing the months This computation should al- 
A\a}S ho check wl, nrefcrabl} b> a simple table prcparcrl by the evaminer 
Table 3l illustrates such a table, prepared fora test given on JMay 21, from 
'V Inch the months can be read directly It \s desirable to verify the years 


TABLE 31 


A Tarle roR CoMPuiiNQ Months Since Last Birthdat 
(Date op Test Mat 21) 


Ihnh lays Ueliccen Dales 

ManChs Since Birlh/lay 

Jinuirv fi nnd FchriLiO 5 

4 

rdjnnrj G and March 5 

3 

Marcli C nnd April 5 

2 

Aoril G and ^Ia} 5 

1 

Mil} 0 and June 5 

0 May 21 TwtDate 

June G and Jut} 5 

u 

Jul} G and August 5 

10 

August G nnd fccptvmbcr 5 


ScptcinlHp 0 nnd October 5 


October fi Tfid iNoicmbor 5 


\ovrnibcr 0 nnd Decimbcr 5 


Dccpmbnr fi nnd J inu ‘'0 5 



for those pupils whose birtfwtajs come m the month the tests are gnen or 
in the nc\t month or bo, for even high-school pupils will often make an 
error of one jear ‘When the correct MA’s and CA’s are determined, the 
IQ values can be read from a table If (he IQ’s are comput^ by actual 
div isiou, It IS necessary to have the rvork done twice mciepeiiaent y 
Inlcpprclalion of the IQ. The IQ is a measure of brightness, or of rate 
of wtellectuil deidopmeot Following the lead of Tcotm, 
consider IQ's of 90 fo 110 as “normal" those bdo^^ 

those above as supernormal According to this scheme 

md,cate“lecblemmdedneBs"Ind.v.dualsmth.sgroupareQftensnMi^^^^^^^ 

into three types or levels of feeblemindedness idiots, below 25 mb™ es 
S to Tand morons, 60 to 69, mdusitc Most clinicians recognise the o 

l~^j£Sv=Tj:LrfisS£iT. 

psychological social “'^‘iical nnd the 







284 


THE TESTING PHOGRAM 


and jellow, but it is hard to tell where orange lca\es off and becomes red 
on the one hand or j ellow on the other The concept of “genius” is w orthy 
of further consideration Follmving the lead of Terman it has been common 
to interpret anj IQ of 140 or abo\e as indicating “genius or near genius ” 
E\ndence is accumulatmg which indicates that this limit is much too low 
In an illuminating discussion of this problem, Hollingworth comes to the 
conclusion that a minimum IQ of 170 or 180 is more defensible, and that 
works of gemus are conditioned bj high abilitj when combined wnth zeal 
and hard work ** Terman supports the conclusion that “aboic the IQ leiel 
of 140 , adult success is largely determined by such factors as social adjustment, 
emotional stability, and dnie to accomplish ”** 

Ad>antages of the IQ. The identification of \anous degrees of bright- 
ness is one of the ad\antages of the IQ Moreo\er, manj studies ha\e 
«:hown the IQ to remain relati\elj constant under ordinarj conditions from 
year to j ear, although radical changes in the home and school environment, 
wluch rarelj occur, are likely to be reflected in larger changes in IQ when 
thej do occur Kemzek® summarized 97 studies which used the 1910 
Stanford Bmet test and 27 studies which used group tests The median 
correlation coefficient bj the test and retest method w as 832 for the indi- 
Mdual test and 846 for the group tests The corresponding range of the 
middle 50 per cent of the coefficients was 760 to 889 and 779 to 880, 
respecti\elj The sinulanty of results for the indmdual and group tests is 
remarkable But these correlations permit considerable \anations, which 
apparentlj arc more likely to occur at the extremes of the distribution 
than near the center There seems to be a tendenej for the lower IQ’s to 
decrease «omcwhat on later tests, while the endence for the higher IQ’s is 
somewhat contradictor} After six jears Terman found that 73 of his 
gemu‘=es, still below a CA of 13, had lost in Stanford Bmet IQ, the bo}S 
3 pomts and the girls 13 pomts on the axerage On the other hand, Cattell 
at Harvard found that children with IQ’s abo\ e 120 gamed approximate!} 
8 points ID three to six }ears Most studies haxe noted a regre'^ve effect 
however But, of course, most ca^ tend to cluster rather closely about 


Hollingsworth CMdrmorer 180 IQ Stanford Bmet Ongmand Detelopmenl 
Hudson N 1 World Book Compan% 1942 

A^ional Society for the Study of Education Part I 
SS^o“uri;,°cL? “ School PubtatosCompan. 1940 Quoted to 

Febman‘’*W'i4 '' Con.taDC} of the I Q • Piyrholomcal BuUetm 30 lo4 

Km„<m of Ihel^hn/onlB.ntt Scale page 13 Bo-tou Houghton M.fflm Company lB4-> 



Tllr. VSrs 1 \B LIMITATIONS OF NORMS 2Sj 


the conicr of (iic disinbution, vilierc the IQ ,s “fairly stable 
dus7on c'pOTmonW ciidence, Cattell arrucs at this 


” After a sum 
Practical con 


The rc«ul(<? arc reported a«j cMdcncc of the larcc clnnccs m tlif. rn ,.v j 

cmpfiis^c tfio ciution Wlth\hI!:h^he 

h ® mtelhgcnre test mutt »«* interpreted cun though it be an 

mduKhnl examination made h\ an expert Jugn it oe an 


PmccflioIQof tlicaiengeptipilisldelj to lie rehtud^ stable if one 
nnlU computed for C \’s between 7 and 13 jears his M I at a later age 
can be estimated with fair a&«niranee from his present C S. and his IQ For 
example suppo'ie that n pupil wlio had an IQ of 95 when m the third grade 
IS now in the fifth grade Ihs present CA is 10 2 or 10 17 >ears His esti 
mate<l M V m jears is 10 17 X 95 winch is 9 60 « 9f = 9j!^ = &-8 
Companions ax ith ochica ement test scores expressed m ages can therefore 
he made wherexer such companions are thought desirable without the 
ncce«sitj of repeating tlic mteihgcncc tests at the same time Although 
It IS dc«irablc to repeat intelligence tests until at least three tests hax e been 
gixcn dunng the pupils educational career the tests need not be gixen 
at (ho same (ime as the achievement tests m order to make comparisons 
The IQ from a test giv en bj an expert to pupils in the public school ma} he 
rtvjanlpd ns sufUejont)} constant to j»sJ»o nOjnstment? for s different date 
fnirl> safe, at least for a pcrioil of two or three >eors 
LunitntJons of the IQ In common xxith all units m which test scores 
arc expressed (ho IQ suffers from two limitations The zero point is arbi 
trarj ratlicr than real and the xanous units arc of unequal length or value 
The difTcrence between GO and 70 is not equal to the dilTercnce between 
90 and 100, or to the difference between 120 and 130 In the same wa> it is 
ab«iird to saj that a pupil xvhosc IQ is 120 is txnce as bright as one whose 
IQ is CO, or half again as bright as one whose IQ is 80 But in this regard 
the IQ IS no worse than arc all raw test scores and practically all other 
denxed scores For example it is obvious that when the thermometer reg 
isters 10 degrees below zero it is not twice as cold as when the thermometer 


registers 5 degrees below 

There is also another serious limitation of the IQ JIany studies have 
shoxx-n that the IQ’s on one test are not comparable to those obtained on 
another test This was showm clearly in table 24 on page 110 where the 
mean IQ’s on fix e intelligence tests ranged from 96 4 to 118 2 for the same 
284 high school seniors I he ranges of inofixiduaf scores on the Jive tests 
are not reported, but it is safe to infer that some of them must hax e been 
quite large IQ's from different tests must be equated in order to make 
them comparable, for even when axerage IQs on different tests are close 
together the extremes are hkejy to xar> widely One solution xxhich has 

« Psyche Catteir Stanford Bmet IQ 4 action* Srhool and Sonelj -io 

Xfn ] 1937 



28G 


Tlin TESTING PROGRAM 


TABLE 32 


Table for E«catino Isteuiofncl Quotient Vaiufs» 
(Us'e only \nth cn^ps which were ni'cteen je-ir** 
of or o\pr when twted ) 


Cor- 

IQ on 

IQ 

on rrini* | 

IQ on 

IQ on 

IQ on 

Cor- 

reeled 






SI{A 



IQ 

> alue 

Higher, 
’^orm A 

Total 

Non- 

Laiiguagp 

[jauguage 

lfCn\rm«r 

/•UA 

rert/nf 

Value 


88 

96 

87 88 

96 

87 

75 

07 

99 


89 

07 

89-90 

07 

88 

7t» 

1)0 

91 


90 

98 

91-02 


89 

77 

09 

92 

93 

91 

99 

93-94 

9S 

90 

78 

too 

93 



100 

95-9(> 


91 

70 

101 

91 

93 

92 

101 

97-9S 

09 

92 

SO 

102 

93 

96 

93 

102 

99-101 


93 

81 

103 

96 


94 

108 

102-105 

100 

94 

82-83 

104 

9t 

9R 

95 

104 

lOO-lOS 


95 

84 

105 

98 


90 

105 

loo-no 

101-102 

96 

8V87 

106-107 

99 

100 

97 

106 

in 112 

103 104 

97 

88 

lOS-109 

100 

101 

08 

107 

113 

105 

98 

89 

116 in 

101 

102 

09 

108 

114-116 

106 

99 

90 

112 

102 

103 

100 

109 

117-118 

107 

100 

01 

113 

101 

104 

101 

no ni 

119 


101 

92 


104 

103 

102 

112 

120 

103 

102 

93 

114 

lOS 

106 

103 

113 

121-122 

109 

lOJ 

04 

113 

106 

107 


114 

123 


104 

95 

no 

107 

108 

104 

115 

124 

no 




108 

109 

105 

116 

125-126 

111 

105 

96 

117 

109 

110 

106 

117 

127-128 

112-113 

106-107 

97 

118-119 

no 

111 

107 

118 

129-130 

114 

108 

OS 

120 

in 

112 

108 

119 

131-133 

113 

109 

99 

121-122 

112 

113 

109 

120 

134-136 

116-117 

no 

100-101 

123-124 

113 

114 

no 

121 

137-139 

118 

in 

102 101 

125-126 

114 

115 

in 

122 

149-141 

119 

112 

104 

127 

115 

116 

112 

123 

142-143 

l«) 

113-114 

105 106 

128 

116 

117 

113 

124 

144 

121 

115 

107 

129 

117 

118 

114 

125 

145 


116 

108 

130 

118 

119 

115 

126 

146 

122 

117 

100 

131 

119 

120 

116 

127-12: 

147 

123-124 

118 

110-111 

132-133 

120 

121 

117 

129 

148 

125 126 

119 

112 113 

134 135 

121 

122 

118 

130 

149 

127-128 

120 

114 

136-137 

122 

123 

119 

131 


129 

121 

115 

138-139 

123 

124 

132 

150 

130 

122 

116 


124 

125 

120 

133 

151 

ni 

123 

117 

141-142 

123 


• These value* apjily onl^ to the California Tt^t of Mental MatimU Short rorm, 
1942 edition ^ 


*« This IS a slight modification of Table XXIV m an unpublished report by Walter G 
lleil and Alice Horn, “A Comparative Study of the Data for Five Different Intelligence 
Tests Administered to 284 Twelfth Grade Students at South Gate High School- 
's Angeles Los Angeles Cumculum Division, Los Angele* City School Districts, 
February, 1950 (Mimeographed ) 


THE USES AiVD LIMITATIONS OF NORMS 287 

proposed « (0 equate a!f teats ,n terras ef seme widely used test " 

lied '‘"l *“ tests locallj This was done by 

lie, and Horn, as shown in Table 32, for the 284 second semester twelfth 
Bride students in a lais Angeles, California, high school Their five mtelli- 
Rence tests w ere the Otis Sclf-Adminislcnng Test of Mental Ability, Higher 
lAlmiiiation, Fomi A, the California Short-Form Test of Mental Mi- 
liintj, 1912 edition, the Tcrmnn-AfcNcmar Test of Mental Ability the 
Soenco Jleseirch Assoenfes Pnmaiy Jfentaf Ability tests, and the SKA 
Noii-f'erlnl form The hold-face IQ’s going from 90 to 123 on each side 
of Table 32 arc corrected values Note, for instance, in the 117 row that 
aetlnl IQs on the lanous tests range from 107 to 144 At 125 they run 
from a /ow of 117 to a high of 151 

Lennon*’ has pro\idcd tables yielding equivalent scores and equivalent 
IQ’s for the Otjs Qaiek-Sconng Mental Ability Tests Gamma Test, Fonn 
Am or /3m, Pintncr General Abihty Tests Verbal Senes, Advanced fesC, 
Fomi A or 13, and Tcrman-^fcKcmar Test of Mental Abihty, form C 
or D An IQ of 100 on the Otis is equivalent to 99 on the Pmtner and lOi 
on the Tcrman-McXcmar Corresponding to an Otis IQ of 130 is a Pmtner 
IQ of 13-1 and a Terman-McXcmar IQ of 138 At the lov, est IQ lev el shomi 
in Lennon’s Tabic III arc the follomng figures Otis, 76, Pmtner, 64, and 
Tcrman-McNcmar, 60 Therefore, among these three tests issued by the 
«ame publisher discrepancies at the “average” JQ of 100 are slight, but 
dilTeronccs mo} be senous tor loiv or high scorers unless the recommended 
conversion is made The Hcil-Hom and Lennon studies point up the fact 
that an tndit'tdual may hate as many IQ's as there are dij/erent mtelltgence 
tests 

Doubtless the fundamental solution is for all test makers to standardize 
llieir tests, whether they aim to measure intelbgence oi achievement, on a 
national population bo chosen as to conform fully to the mathematical 
theory of sampling As long as tests continue to be standardized on samples 
chosen primarily upon the basis of convenience, even when they involve 
large numbers and wade geographical areas, there is still no assurance that 
the samples are truly representative and thus comparable with each other 
Because of its numerous limitations some authorities would abandon the 
IQ concept altogether Stoddard,” for example, characterizes it as a “myth” 
pure and simple No one recognizes the limitations of the IQ more clearly 
than Its friends, ns this statement from Terman” indicates “An obtained 

« W S Miller, "VanatioQ of IQ s Obtained from Group Tests Jffumat of Edwia 

of Three Welhgence Tert. r„I 

SeJe 7jbookN« 11 Yonkerl N Y Beet Company, 4 

<• George D Stoddard, TAe Jl/ron.iw <r//«feB>Se»“ page2o8 heir York Ihe Mac- 

Ymrhoolc of Jfatuo^ Soculy M the Stod, of EducaUon op cl 

page 466 



288 


the testing program 


I Q IS not onl> subject to chance errora resulting from inadequate sampling 
of abilities, but also of numerous other errors, including practice effects, 
negatiMsm or shyness, the personal equation of the examiner, and stand- 
ardization errors m the test used 

All things considered, the authors are disposed to agree mth Terraan and 
that the sensible thing to do under the circumstances is to “emploj 
the simplest indices a\ailable and as rapidlj as possible acquaint teacher, 
school counselors, social workers, and phjsicians >Mth their significance and 
their limitation ” The M V and the IQ arc examples of such “simple in- 
dices ” Houe\er, amateur test users mil do well to remember at all times 
Hildreth’s warning that “no one IQ c\er indicates exactly any child’s 
tested ability No matter how obtained, the IQ should never be accepted 
as the final \ erdict, but rather as a point of departure for further inv esti- 
gation 

Other derived scores. To avoid the difficulties in the MA and IQ, 
other tjqies of derived scores have been proposed Of these, the three most 
common will be con'^idered bnefli It must be apparent at the outset, how- 
ever, that no norm can be any better than the sample upon which it is based 
or the racasunng instrument emplojed Errors of sampling and of meas- 
urement cannot be avoided by the simple device of shifting the unit in 
which to express the norms The Cooperative tests have moved in this 
direction by taking as a point of reference the “50 point,” the score “in- 
tended to represent the score which the average wlnte child in the United 
States would make at the end of the particular course if he had attended 
a tjpical school and had had the usual instruction in the subject in ques- 
tion 

The Personal Constant (PC) has been suggested by H Heinis as a sub- 
stitute for the IQ Kuhlraann was so convinced of the merits of this method 
that he included a table of Hemvs Mental Growth Umta, whwlv he recom- 
mended m place of the IQ for with the Kuhlmann-Anderson tests 
On the other hand, Cattell” found the PC more constant than the IQ for 
pupils of low intelligence but not for those of high intelligence The PC has 
not received wide acceptance 

A second substitute, proposed by many writers, is the vcrcenitle rank 
A percentile rank is a description of a pupil’s position in a typical age or 
grade group m terras of the percentage of pupils who fall below that score 
A percentile rank of 50 would, of course, be exactly at the median In hke 


« Lewi« M Terman and Maud A Memll op nt page 29 

« Gertrude Hildreth &tauford Bmet Retests ot Gifted Children," Journal of Educa 
Uonal Research 37 301 December 1943 


« John C Hanagan The Cooperalive Achi&emeni Tests A DuUettn Reporting the 
Basie Frtna-pla and Procures Used tn the Deielopment of Their System of Scaled 
Scores page 19 New York Cooperative Test Servnee 1939 

» Psj che Cattell ‘ The Heinu Personal Constant as a Substitute for the IQ ’ Journal 
of Educational Psjchology 24 221 228 Match 1933 



Till USLS AND limitations OF NORMS 289 
manner a percentile rank of 10 «ouId shrni that in a typical group on], 10 

mcTo T ? "lo™'" of SO "onW 

m^n hat milj 10 per cent make better scores than that, since SO per cent 
fa 1 belon Tins is a , eij simple and useful system that is mdely used for 
achimcment tests also, but it has t«o limitations One is that a percentile 
rank of a gi,cn magnitude in one group is not directly comparable mth 
the same percentile rank in another group A 10th percentile pupil m the 
freshman class is mariifcstlj not the same ns a 10th percentile pupil m the 
senior class, for cample A second limitation is that the percentile rank 
units arc of unequal length For example, in a typical group an IQ of 62 has 
a Ist-pcrcenlile rank and an IQ of 08, which is an mcrea'^c of 0 points has a 
3rd percentile rank, but another IQ change of 6 points, from 8 1 to 91 
raises the percentile rank 11 points In other words, the distances bi tween 
percentiles near the center of the group are much less than those at the 
extremes 

A third procedure is the method of standard scores, sometimes called 
stgma scores or ^-scores, u«ed hy Stutsman in her AfernJI Falmer perform 
nnco scale and bj Wcchslcr in his scales These units are expressed in terms 
of the mean and standard dcnalion of the typical age or grade group or, 
for that matter, of anj group An illustration unll help to make the sjstem 
clear Suppose that a pupil makes 40 points on one test and 80 points on 
another test It is clcarl> unsafe to say that he did better on one test than 
on the other Tins is cwdent if ne find that the mean score on the first test 
IS 30 points and that the mean score on the second is 90 points In other 
isords, the pupil is 10 pointsabovcaverageon one test and 10 point*! below 
a\ erage on the other test To reduce these scores to a common denommator 
requires one additional step nainel>, to fake into consideration the van 
ability of the scores as as their central tendency Suppose then, that 
the standard deviation of the first is 10 points and that of the second test 
IS 20 points It IS now clear that our pupil is 1 0 standard deviation distance 
abo\e the mean on one test and 5 standard deviation distance below the 
mean on the other These two figures are standard or sigma -^coreb and arc 
WTitlen +1 Oir and - 5<r To avoid negative numbers, the suggestion is 
sonli times made that the mean score be called 50 arbitronly and each 
standard deviation distance above and below be equivalent to 10 points 
In our illustration opposite, the pupil’s scores would be 50 + (1 X 10) 

= GO, and50 - (05 X 10) “ 50-5 “45 , , , , i 

The system has much to commend it statistically In fact its principal 
limitation is that it appears to be rather cumbersome to handle That im 
pression is, however, probably due more to its unfami lanty than to an, 
thing else Some writers^ point out that not only are MA ® “-f 
m the usual fashion indeterminate for the upper half of the adult popula 

■■LL Tiurstone and Thato Gmnn E»n.'"»t.on. IMl 

Ammein Cminol o» hducoUm audias 5 1941 



290 


THE TESTING PROGRAM 


tion, but they also argue that slondani scores or percentile scores yield 
much more information even for joiing children Figure S') shows the rela- 
tion between standard scores, percentile scores, and KcMsed Stanford- 
Binet IQ’s It will be noted that the IQ's on the llevised btanford-Binet 
may be considered roughly as standard scores whose mean is ICO and whose 
sigma IS 16 



LQ S2 <0 «a 76 $4 92 109 199 ll« 124 132 |«9 142 20. 


Figure 3S The ReUtion Between StindanJ Scores IVrcentilc Tlsnks, and Revised 
Stanford Bmot IQ a (Base I on Terman and MernU, Jnttlhgcnce, page 42 ) 


Regardless of the type of norms used, the teacher must never lose sight 
of the fact that all measurement is subject to error and that scores can 
rarely be taken at face value Some persons are so imprcsserl by the “ubiq- 
uitous probable error,” to use Kelley’s phrase, that they think numerical 
scores of every kind “convey an unwarranted impression of exactitude,” 
‘"'elligenfo tests in general tenns, sueh as 
u , nonnal, or “bright In the writers’ judgment, a better prac 
tice IS to continue to employ the numerical scores but to be keenly aware 
of their limitations 


D. The Use of Norms m Interpreting Scores on 
Achievement Testa 

Fducational age versus educational quotient. In interpreting scores 
on achievement tests the terms cducatzonnl aijc and educational guolienl are 
some imes usei m just the same way that mental age and intelligence quo- 
tient are used in interpreting scores on intelligence tests In other words, 
educational age, or EA, is a measure of edurational maturity, or level or 
,s f m' sro" th In like manner the educational quotiml, or EQ, 

— t "'ens ure of rate of educational growth or development The EQ is 

Binet sL? the Stanford 

43 72 76 January 40 196, March. 1943 AUo Psychological BuUelm 



THE USES .UVD LIMITATIONS OF NORMS 291 

..btained b, d.uding the TA bj the CA Tor example, a 10-year old boy 
lils mndo a score of (lO points on a rertam nchierement test, ahich is the 
at omge score for a 12-1 car nW p„„d i ho boy :s then said to have an EA 
of U-0, \\lnch, (liwclcd h\ 10-0 gi\os lum an EQ of 120 In like manner 
Another lO-jcir-old hoj in the same class might make a score of 35 points 
« Inch IS the a\ cmgc score for a pupil of S 5 ears and 6 months His EA J 
S~C, and his EQ is S3 It should he noted that the terms E\ and EQ refer 
to scores made on general achic\cment tests or on test batteries involving 
«=e\cril subjects If n test m onij one subject is used the terms suiyec/ 
and sufijcci quotient arc cmolojcd For example, a reading test uould yield 
reading ages and reading quotients, while an arithmetic test would jaeld 
arithmetic ages and arithmetic quotients and so on for the other subjects 
IJ ICS of n\. The \aluc of EA and of the various subject ages is that 
fh (.3 make poa.sib/e a meaningful interpretation of scores in terms of a 
rtlalivclj stable unit, chronological age They also facilitate important 
comparisons witli norms, on both mtclligcnce and other achievement tests, 
wlicnevcr tlicj have been standardized on comparable groups, as well as 
watlv the individual's owai MA and CA 


Liniilutions of EA. EA and all subject ages have many of the limita- 
tions already pointed out in the case of the MA Probably the most serious 
IS tlmt thev reflect the promotion policies and holding pow er of the schools 
m w hirh llie teats arc giv cn It is a matter of common observation that the 
pcrfomiaiice of a lO-ycar-oId pupil who is retarded in the grade is not the 
same as llmt of a lO-jcar-oId pupil who has made normal progress, and 
much less than that of tfie accelerated pupil of the same age Crawford" 
has made an extensive study of the influence of such factors upon norms 
based on unselected groups, and comes to this significant conclusion "The 
factors of chronological age, mental age, and rate of progress affect test 
norms to a degree that makes the use of norms based on groups m which 
these are not rontrolled of doubtful value ” His recommendation is that 
both CA and MA ^huuld be u&ed in establishing norms One solution to the 
prulilem i** to use only pupils whose CA’s are normal for tbeir respective 


grade® in « ^imputing norms 

Oni! (idle r limitation of c'^^tlng age nouns is that age units on one test 
arc not comparable to those on other tc^ts that are presumably measuring 
the same Hung Test publishers one a service to the puhhr and should 
prepare (ablcs for equating age norms on >anonsai.hjcvenient tests in much 
tlie same way as has been done by the Cooperative ichieveinent Tests" 
anil the vnrii.iis parts and forms of th. Metropolitan and Stanford Achieve- 
ment 'iosis" Another less scnoai Jmutataon is that the age umts on 


■’Johnlt l-rawfcrd .'Aj, „„,1 Pragres- lactors m Tct ParrerrU!, ./ 7<»m 

“TibMiS'by “I'e ro' xf Ed-rrPonal Testing Serv.™ 
Publiali«id by World JJooi Company 



202 


THE TESTING PROGRAM 


anj one si-ale or test are not necessanly equivalent throughout its length 
An important limitation of the EA and all other gross units is that they 
lump together manj and diverse elements m such a "ivaj as often to obscure 
significant differences Two pupils of the same CA or MA might have an 
EA of 10-0, for example This does not guarantee that thej are by any 
means identical m achiev ement One pupil might be greatly accelerated 
m reading language, and literature, but retarded in arithmetic, spelling 
and science, whereas the exact opposite might be true of the other pupil 
EA, which is a composite, or average, has taken no account of the pattern, 
which may afford the kej to an adequate interpretation This fact, of 
course, does not mean that age scores and other av erages hav e no v alue, 
but rather indicates that they are inadequate bj themselves The practical 
implication is clear The total score, whether age or what not, is important, 
but must be considered alwajs in relation to the pattern of the responses, 
usually best represented as a profile Figure 36 showing two profiles drawn 



F.^te 36 Tie PtofilBs of Two Pupils Who Made the Same Total Score on a Gen 
eral Achie^ment Test (From the Modern School Achievement Tests published bi 
Bureau ol Publications Teachers College Columbia Umveisity ) 


upon the same chart for pupils making the same total score, should make 
this pomt clear 

Use and limitations of EQ The method of computing the FQ and 
the various subject quotients has already been desenbed As measures of 
rate of educational progress these quotients may be useful m the interpreta- 
tion of scores on aehiet ement testa However, no such elaborate scheme 
for the interpretation of these quotients as was desenbed for the interpre- 
tation of quotients on mteihgence tests has been worked out There is 




7 ///; f .SCS AND milTATlOm OF NORMS 293 
..nlh...S corrcpoiKl, ,o .„cl. term. a. ‘'feeWemmdedne,..-- or ■'gca- 

lor KJ SiliH ..liKilioi, d growth , .mliinic at least throughout the formal 
.chool IK mxl thi ri ts n., iinihh m of telcetmg a maxmiiim dn, sor siieh as 

of ooequal 

length and tli it (In fpioUrnf KvljMiqiiP w not apprapnate for u^e in high 
pcliool, ^^ho^i agL nnrni^ on a< liipirment ^e^ts arc ordinarily nut available 
tiHJonbf^Jh tho most tenons limitation of quotients, aa well as of other 
norms, is that tin- units cm ouo test are not directly romparable with those 
on another lest that purports to bo moisiiring the same thing Tables for 
equating ciuoticntj, on achitvcment tci>ts similar to those for intelligence 
tests !m\ t not been puhhdicd Belter still would be the exercise of greater 
care in the ongmal standardization The test record should always indicate 
the name of the test and the foim u'^cd, for achievement tests as well as 
for intolhgci».c te?ts 

U««e and Imiitutiorm of grade norms. It js a very common practice 
to interpret achievement tests m terms of ffrode norms Grade norms on 
standard tesf'^ are usinllj the average scores made on the test by pupils 
in each gmle when the tc>t lias been given to pupiN m widely scattered 
areas fn the curlu r tests (liut grade norms were usually for the end of the 
grade onlj , ollhoiiph sometimes for the middle of the grade also This made 
comparison with norms somewhat dilhcult, unless the tests were given 
at the same time in tlie year Figure 31, on page 267, illustrates a simple 
graphnal method of maUng comparisons with norms for a different time 
m tlic jear Of course, such a comparison assumes uniform progress 
throughout tlic grade, which may be only approximately true m some 
instances A slight variation of such norms for high school use is to base 
the nonns upon the length of time the subject has been studied rather than 
upon the grade or j ear in which it happens lobe offered bince many high- 
school subjects do not continue over the entire high-schooJ period and have 
no definite grade location, norms based on the number of semesters the 
subject has been studied are very u-^eful The problem of interpretation is 
just the same as for regular grade norms 

In recent years many tests below (he senior high school level have norms 
available for every month in the school year For example, 6 0 means the 
norm for the beginning of die sixth grade, while 6 5 is the norm for the 
middle of tne grade In like manner 4 2 means the norm for the fourth 
grade two months after school starts, and 4 10 means the norm for the end 
of the fourth grade Such norms are often called G scores, and sometimes 
B-scores They have the distinct advantage of being readilj understood 
They also have certain dangers and lumtaUoDS For one thing they tend 
to imply a degree of mathematical exactness which the accuracy of easting 
tests hardly warrants Certainly it is unsafe to take them Uterallj at the 



294 


THE TESTING PROGRAM 


face \ alue A still more serious limitation is the lack of comparability of 
scores on different tests Adams found,** for example, that eight anthmetic 
tests rated the mean performance of 152 pupils all the Tva> from the fifth 
grade to the ele\ enth grade, depending upon the test u^ed It is unnecessar} 
to comment upon the ab'mrditj of fractional grade norms in a situation 
like that The solution is, howexer, not so much in the abandonment of 
grade norms as m their further refinement There is another danger in in 
terpreting grade norms, no matter how accuratclj determined This danger 
arises partlj from the fact that a school x^ath an o\ erstnet promotion policj 
will tend to «how up fax orabl j on grade norms simplj becau'te of the pres- 
ence of a great manj pupils m the «cxeral grades who rcallj belong m 
higher grades It is alwajs well, therefore, m an> apparentlj superior 
school, to make a comparison on the basis of age, to see whether the supe- 
riority IS real or onlj illusory Of course there would be little difference 
between schools which promote stnctlj on the basis of C\ and those in 
which the percentages of acceleration and retardation are balanced, a con- 
dition which rarel> exists 

Ruch and Segel, after noting some exndence that “recent tests max pos- 
sibly hax e much more dependable norms than tho'c standardized a decade 
or so earher,” nexertheless make the suggestion that” 

many factors necuhar to the tadmdual school system must be considered 
m the interpretation of te«ts «uch as the legal age of school entrance, the actual 
ax'erage age of school entrance rates of acceleration and retardation rates of 
elimination from school percents of failures of pupil® genuine differences m instruc- 
tional efficiency and variations in axerage mental and educational capacity from 
school to school 


Hams” has called attention to a rather common error made in the in 
terpretation of grade norms at the primary lex el It arises from the failure 
to take into accoimt the fact that zero performance on an achiexemeut 
test is 1 0 A first grade class whose grade score at the end of the x car is 
2 0 has made only normal progress for the y ear 

Other norms for achiexement tests Sexeral other types of norms 
are used some of which require brief mention Of these, doubtless the most 
important are vercentile norms As m the case of intelligence tests already 
disoi^ed such norms interpret a pupU’s score by describing his po-ntion 
in the group m terms of the per cent of pupils who faU below the score made 
Generally all percentile ranks from 0 through 99, but sometimes only cer 
tamp^ts such as the 2oth, 50tb, and 75th, are gixen These percentile 


unpubliahed masters thesis by Eunice 
Anthmelu; Tests University of California 
3 39 February 1933 

CutdoTice page 82 ■^*"*"*“"* oflAe /mfipiduof /nrentory tn 

“^rSrt J Umt^ States Office of Education 1939 



Tin: Ukns ^^'D UMITATIOXS or xohms 235 

T!:hkhh?J,T^ ‘"O l.m.ia(.ons, nether 

h IS u ualli \ery ecnotis for most purposes One is that the scale 

<i”cqual in length, anti the other is that percentile \ahies in one 
KHU 0 or ago group arc not directly comparable ^nth those m another 
f^taiulard scorn are also u.=cd They arc interpreted in the same manner 
as arc similar scores on intelhgcnrc tests These ha>e already been dis- 
riK^wl MtCall has proposed a modification called a T-score based upon a 
standard group compo^od of 12-joar-oW pupils All age and grade groups 
are dc<cnhc<I locating their T-vcorc position in this 12-jear old group 
I he mean i*) guen a \alue of 50, and each standard denation distance 
ahoie and belou is dmded into tenths, each counting one point Tor ex- 
ample, n 15-ycar-okl pupil makes a reading score on the Thomdike-AIcCall 
Reading Tc«t, which, according to the table of norms, is a T-score of 60 
In other woitls, this pupil is located lir distance aboie the mean of typical 
12-3’car-oId pupils u ho hi^ c taken the test The T-score technique is some- 
times u<5orI with other ago groups The pnncipal limitations of the T-score 
are that it not well adapted to high-schooi tests and that it is rather 
cumbersome c\cn for grade-school tests 
VnUie of local norms. Practically all norms published on tests are 
5cw‘oJJwJ nalionol norms Wlien surfi norms are eaTeSv]]y derived, they are 
of great value in interpreting the scores It is easy to o\eremphas}ze their 
V nine for the ordinary school and school system, how e\ er They must nev er 
lie taken as standards There arc such vMdc variations m the length of 
school terms, in the equipment of schools, in the training and experience 
of teachers, and in other important respects among the several states and 
among the tchool units of any one state as to make any single senes of 
norms for the whole nation inadequate National norms must be supple- 
mented by norms for the state, count>, and city school systems, and even 
for the individual school ^\Tnt is really important in most cases is the 
comparison of grades, classes, and schools which operate under approxi- 
matcli' the same conditions Lindquist** has pointed out sev’eral distinct 
advantages of the regional testing programs used in Iowa for a number of 
vears 

Tor purposes of classification, what is needed is a set of norms for the 
school itself To derive satisfactory local norms, all that is required is to 
combine all pupils m the same grade and then to compute standard scores 
or percentiles If age norms are desired, the pupils will be distributed ac- 
cording to CA or I^IA, and the medians computed In larger schools and 
school si^sfems norms should be derived for slow and rapid learners as well 
as for average or normal learners on each grade lev 

It would appear, then, that the more specific the norm the more useful 


Directions for Measurement and '■ 
oti Olucatiot] 1*)44 


in ^etr 
>• if 



290 


THE TESTING PROGRAM 


It becomes Educators are coming to recognize that each indi\ idual has 
his OOTi umque pattern of growth This position is clearly stated as fol- 
Io\\s ** 

The time has come when we should fcasc to be pnmanlj interested in comparing 
one child w ith another, one class w ith another or anj class u ith a norm \\ e should 
be pnmanlj mterested in compann}; each child n ith himself n ith his past record 
and with his potentialities To center attention elsewhere is to mi«s the point — to 
miss the sen.ace which tests can render 

Figure 37 is the test score profile of a college junior, Richard Roe, based 
upon local norms at “Siwash” College He was adramistcicd an intelligence 
test, an English batterj consisting of i/rammar, organization, and reading 
tests, the latter hu\ing tocabniary, reading speed, and reading fevel 
subtests, and a five test a(bic\ement battery /nstory, Ziteraturc, science, 
art, and mathimatics The norms arc sliowm both us pereentilL nnko and 
as “stanmes” (standard scores on the basis of 0 points — that is, running 
from 1 thiough 9) TC means “total English w ore," and TA means “total 
achie\ement score” I he mtclligcnte, TE and TA points are connected 
with a solid lino, while the baltciy tests and bubtusts are joined by dotted 
lines 

In consultation with his adviser, Richard can see at a ginnee that ho is 
above the Siwash junior class average in gincral, but that his history, 
literature, and art scores fall considerably below tin other eight points 
Richard and his adviser can use this information to good advantage in 
planning his last tw o j ears of college courses 

E. Methods of Comparing Intelligence and Achievement 

One of the most important questions to raise about any pupil is How 
well IS he getting along in comparison with Ins capacity’ Whenever intel- 
ligence tests and achievement tests for the pupil have been expressed in 
comparable terms a rough answer to this question is possible Hut the 
problem is far more diffiiult than would appear on the surface According 
to Kellej,** about 00 per cent of whatever is measured by a so-called gen- 
eral intelligence test is the same as that measured by an all around achieve- 
ment test battery In like manner he lias computed the “community of 
function between mtelligence tests and anthmetic tests as about 88 per 
cent and that between intelligence and reading tests as about 02 per cent 

Since a scant one-tenth” of the tests are utilized in the measurement of 
dilTcrente between intelligence ami ai hievemenl, he pointb out the serious 
haza^f such compansons It would certainly appear that unless the 

page 208 Yo„ke„ 



Tiin usns Am limitation of norms 


\ Wamc -gog, p.^u^ap 
First 


I«al MaJtlnE Address 


Testing Proeram, Siwash College 

. Sex @F Oite^em-cmEQiT 





Figure 37. A profile Based Upon Local Norms 

apparent differences are verj- great, the eastence of true differences cannot 
tie safely inferred Conrad" has desenbed the conditions nhich must be 
met before scores are strictly comparable 

'•1! S Conrad "ComparaWe Meamrca,” m En^i^ia of EdocahomlBmoro^, 
, lilted by iralterS Monroe pages 340-344 New York The MacmilKn Company 




298 


THE TESTING PROGRAM 


The accomplishment quotient. In 1920 Franzen'^ suggested the ac- 
complishment quotient, abbreviated AQ This is the ratio between EA and 
MA,or betweenEQandlQ The simplest formula is AQ = 100(EhV - MA) 
A quotient of 100 is considered the goal For e\ample, if a pupil whose EA 
IS 9-2 has an MA of 10-0, his AQ is 9 17 - 10 00 = 91 7 = 92 In like 
manner, a second pupil might ha\e the same EA, 9-2, but an ^lA of onlj 
8-3 His AQ would be 9 17 - 8 25 = 111 The interpretation of the first 
case, 92, is that the pupil is not h\ mg fullj up to his capacity, w hich seems 
reasonable enough, human nature being what it is But the interpretation 
of the second case, 111, is rather absurd, since it appears to imply that 
this pupil has exceeded what he is capable of doing bj 11 per cent* A more 
probable explanation is that the quotient is due to inaccuracies in the tests 
and that in this case the errors in the achic\ ement score w ere m the direc- 
tion of making it too high, whereas the errors m the intelligence score were 
m the direction of making it too low The resulting quotient has added 
these errors If the errors, whether due to chance or othenM'se, had been 
in the same direction, thej would have tended to offset each other One 
reason the u«e of IQ and EQ mvohes less nsk is that CA, the divisor, is 
almost wholly free from errors of measurement if obtained from a birth 
certificate Studies by Curelon,*^ Haggerty,*'^ Tsao,^ and others hate brought 
the AQ and similar measures into general disrepute 

Combming intelligence and achjeicmcnt scores Several proposals 
have been made for combirung scores on intelligence and achicv ement tests, 
usually for purposes of pupil classification One of the simplest of these 
proposals is to average the pupil’s rank on the two tests More refined 
methods involve the use of some common denominator, such as the stand- 
ard score One pubhcation** suggests the use of promotion age and promo- 
tion quotient as a basis of classification for instructional purposes Promo- 
tion age (Pr\) is the average EA and MA In this average the two ages 
maj be weighted equallj or unequallj, whichever seems best for the data 
m hand Then the promotion quotient (PrQ) is the PrA — CA On the face 
of it, such practice appears to be averaging things as unlike as cattle and 
hordes But, if Kelley’s point regarding the great communitj of function 
betw een intelligence and achiev ement tests is w ell taken, the practice w ould 
appear to be justified on theoretical grounds And if it provndes a better 


Ato Accomplishment Quotient Teachers CoUei^ Record 21 

432—140 November 1920 

P=A“”n>P''sWntQuotieDtTechmc’ Journal af Eiptn 
menial tdueation o 315-326 March 1937 

H'lggertj An Evaluation of tho Accompluhment Quotient A Four 
7 “s ) &.pteSlir' ImT' taper, nuaJal Education 10 

“ Fei Tmo Is the AQ or P Score the Last Word in Determining Individual Effort? , 
Journal oj f ducal, onal Heycholr^p 34 514-o26 December 1943 
Wo,ldToTk°S,!.X°^ra‘^ M'itropol.tan Achievement Tests pages 38-39 Yoakers 



THE USES AND LIMITATIONS OF NORMS 299 

lud? 

I . TIk U-c Ilf iViirnis in Interpreting Seorcs on 

As a rule, pcrsonalitj test manuals do not contain elaborate systems of 
tioms 1 he pmlilem is iiilierentl> more diHituIt than that presented bj 
cither intelligence or ncliictcracnt tests, for nhich ne have seen that the 
norms arc far from ideil Indeed, the verj essence of personality is its 
uniqncness It is here that the good judgment and common sense of the 
toichcr *\rL IiirIiK important 

Tcrmnn** itrongb questions the possibilit>, or desirability, of establish 
ing nonns for c\nluatuig or adjusting personalities He sajs 

11)0 n«\ rlinloKi«t stands aglnst at the self assurance with which the professional 
»chool counselors m America (tnKno«c the personality faults of little children and 
at the holdnc's with wliKli (hc> undertake the delicate task of adjustrnent 
Tlic student of gonnj'* who is fimilwr with Iho motiiating influences that have 
their origin in quirks of childhood |)cr«onaht> shudders to think what the result 
wouhl ln\c been if «clu)ol coun«elors had had a chance to 'adjust the personalities 
of the budding geniuses of lustorj One can imagine them ^reed from all tneir 
iwculnritics and coniplc’ccs adjusted to the world as it was and becoming mdis 
tingujiliablo from the common Jicrd 

On tlie snnic point PofTcnbcrger” quotes with approva this stavement 
from Huroniik,*^ growing out of a lifelong study of plant life 

One of the greatest fallacies of near science and of amateurs in Nature s school 
is the belief that onlj from the normal can we gel our best development and re- 
sults As a matter of fict Nature shows us again and again that it sfromabnor 
malitics that «omc of our most valuable and beautiful plants arise From that 
wcik or abnormal plant— that genius plant— mav come the very characteristics 
thit we are looking for, and our onlj problem is to rurse it phjsically and keep it 
strong to pass on its o\ erfoad of soirituaf or esthetic case ices to its childrea 

Probably the professional educator could hard’y do better than to accept 
wholeheartcdlj the motto of the founder of the eugenics movement in 
England, which was ‘ Treasure your exceptions ” Those who deviate most 
widely from the average deserve special consideration It is from this group 
tnat geniuses are recruited as well as social misfits of all types It is socially 
undesirable, as w ell as psychologically impossible to make everybody alike 
A distinguisned psychiatrist gives this wholesome comment 
.,Le„,s M Tcrman The McMUtemint ol Personality Sciaux 80 605-008 De- 
“SAIhSTlMenberser PsychoIcEy and lale P,scA«le«.»l 43 30 J.nu 

“T. Ser Burbank and VV.lbur Hall The Beneei ef ihe Yea,, paee 273 Boato 
JImd (Tbml Fdit.on) page aiv New Voik 

Alfred A Knopf Inc 1945 



300 


THE TESTING PROGRAM 


The adjuration to be "normal” «€ems diockinglj repellent to me, I «ee neither 
hope nor comfort m <=mking to that low le\el I think it is ignorance tliat makes 
people t hi nk of abnormalitj onl^ with horror and allows them to remain undis- 
majed at the pronraitj of "normal” to aierage and mediocre For surel> anjone 
who achie\es anj'thmg is, a pnon, abnormal, this includes, not onlj the geniuses, 
but the presidents, the leaders, and the great entertainers I presume most of the 
people m Who’s Who in America would recent being called normal 

As a suimnannng statement concerning norms Flanagan’s opening sen- 
tence IS highlj pertinent ^ 

Te«t «eores are raeanmgful and \aluable to the extent that the} can be mterpreted 
in terms of capacities, abilities, and accomplishments of educational significance 


Selected References for Further Re-xding 


Chauncc}, Henrj, and Fredenkseo, Norman, "The Functions of Measurement m 
Educational Placement ” Chapter 4 in E F Lindquist (Editor), EducatiOTUil 
Measurement Washmgton, D C Amencan Council on Education, 1951 
Cook, TValter W , "The Functions of Meacurement in the Facilitation of Learning,” 
Chapter 1 m E F Lindquist (Editor! Educational Measurement Wasbrngton, 
D C Amencan Council on Education, 1951 
Flanagan, John C. The Coop^atice Aehiaement Tests, A Bulletin Reporting the 
Banc Principles and Procedures tsed in the Deteloomeni of Their System of Sealed 
Scores New York Cooperatne Te^t Senice, 1939 41 pages 
Flanagan, John C (Chairman), ‘ Establishing the Tj-pe of Norms Most Useful and 
Important for the Interpretation of Achicxement Test Scores,” pages 65-113 in 
Proceedings of the 194S Inntational Conference on Testing Problems Prmcelon 
New Jepse} Educational Testing Serxice 1949 Papers b\ Lee J Cronbach, 
™terN Durost,EncF Gardner, E F J mdqmst, Robert L Thorndike and 
Arthur E Traxler 


Flanagan, John C , "Umts Scores, and Norms,” C’haptcr 17 in E F Lindquist 
(Editor), Educational Measurement Washington, D C American Council on 
Education, 1951 

■' > ‘T™Wems of Rogression,’ Harvard Edacational Rmew 11 
21S-223, March, 1941 

Edon, Phrlhp J “On the Concepts of Giprth ami Ahihtj Harvard Educaliontd 
Renew, 17 1-9, Winter 1947 

Se^hore, Harold G , and Ricks, JaniR, H Jr , “Jsorms Must Be Relei ant,” Tesl 
SemeeBiilIetw^a 39, pages 1-4 Psjchological Corporation, 522 Fifth At enue, 
A ew \ ork 18, ^ ew A ork. Mat ,1950 

Stylet, Julian C , “m, 1100115161801101110 Full-Scale IQ’s Are More Variable 

S'*’-' 1“’* IQs,” Jcuraal of Correidlrag Peg- 

Chology, 17 419-120, December, 1953 

Tirfeman Datad V , “Has He Giown7 ’ Tesl Scrncc iVoleiooi A’o U VorldBook 
Lompau j, Aonkeis-on-Hudson, New Aork, 19i>2 4 pages 

rvlfo^Z "Hml., Scores, and Norms" page 695 m E F Lmdqmst 

S 19al Veosarrmeal Washmgton, D 0 American Council on Educa 



PART IV 

Measurement 

in 

Instruction 



Molivalion and Practice as 
Related to Testing 


A. The Pfoliltm of Mothatioa 

Importance of niotnadon. It is generally recognized m ordmarj expe- 
nenco that motnation occupiesan important place m human affairs Such 
familiar proverbs as “^‘You can lead a horse to \vater but you can't malce 
him drink” and "It is hard to (each an old dog new tncks” assign to mo- 
tn at ion a key position The horsedocs not dnnk for the simple reason that he 
docs not iraiit to drink, and the old dog’s ooor oerfonnanceisduenotsomuch 
to lack of abilitj as (o (he fact that he has become too well satisfied with 
the tncks he already knows In a like manner, every experienced teacher 
has seen pupils of mediocre capacity succeed because of interest and en 
thusiasm, while others of more promise have failed utterly because of lack 
of It With these obscrvationsgrowingout of ordinary experience the views 
of psy chologists and other keen students of education are m accord 
Meaning of motivation. The terra vwlimtton js very inclusive Liter 
ally it means causing movement A convenient grouping of motives into 
two major classes is suggested — internal or organic, and external or envi- 
ronmental In recent years the term drive, or urge, has been used for the 
former, and goal, or incenUve, for the latter But in the final analysis, mo- 
tivation, though in some instances externally initiated, always functions 
internally Hunger, thirst, and sex, as well as interests, attitudes, wants 
desires, and temporary mental sets, are examples of drives Incentives may 
be negative as are nain or punishment, or positive, as are rewards in a 
multitude of forms 4 further distinction js often made between motives 
which arc natural or intrinsic, such as a child's interest m play or the 


304 


MEASUREMENT IN INSTRUCTION 


mo\nes, and tho^e ^\hlch are artificial or extrinsic, such as prizes, marks, 
grades, credits, and honor rolls 

Relation of measurement to motivation. ^Icasurcment is related 
in at least two •wa3s to motivation In the first place, there is tlie problem 
of the measurement of motivation itself It is often important to know the 
differences among indiv iduals in the strength of v anous motiv es, the com- 
parative strength of the ‘^arae motive under var>nng conditions, and the 
strength of a given motive in companson with other motives in the same 
mdmdual As the development of wholesome attitudes and interest', is an 
objective of modem education, it IS just as ncccasarj to know how to meas- 
ure it as to know how to measure anj other objeettv e \Nnnle much v ahiable 
work has been done in the measurement of animal dnv cs, up to the present 
no convement technique has been devi'-ed for measunng human motives 
m anj precise manner The measurement of motivation in crlucation is, 
then, a problem for the future 

In the second place, there is the problem of the relation of the measure- 
ment of educational capacitj and achievement to the motivation of learn- 
ing and teaching Since teaching and learning are two aspects of the !>amc 
process it is reasonable to expect that measurement mil be mtimatelj 
related to both Some of the more important relationships will be consid- 
ered m the next two «ection.s 

B. The Relation of Measurement to Motivation in Teaching 

Purpose of the teacher and measurement. 4n obvious relationship 
of measurement and motivation in teaching anses from the fact that the 
purpose of the teacher determines the type of measurement used MTiether, 
for example, the teacher giv es many tests or few , long tests or short, in- 
formal tests or standardized, survej tests or diagnostic, depends upon his 
purpo'se Since not all tests serv e the same purpo':e equally w ell, as has been 
pointed out, the choice of the measunng mstruinent becomes a matter of 
pnmaty importance This problem has already been conaidered at some 
length in Chapter 4 Certain points to be raised later in this chapter will 
al'o hav e a bearing upon it 

Teaching emphasis and measurement. The proper teaching em 
phasis IS determmed bj the results of measurement Aleasurement directlv 
demonstrates the quahty of the pupil’s learning, but it also indirectlv 
reflects the qualitj of the teacher’s teaching In the light of measure^l 
results conscientious teachers attempt as far as possible to correct weak- 
nesses m past teaching and to prevent their recurrence in future teaching 
Messenger^ studied the “influence of the Iowa Academic Te^tlng Program 
in relation to the teaching of English mechaiucs in an Iowa high school” 
and found that the effect had lieen to “motiv ate teachers to greater effort ” 
uath the a llotment of more tune to the saibject and the use of more dnU 

‘ Lncubbsheil nia.-tcr s thesis Umvrieti f lowi 1031. 



MOTIVATION AND PRACTICE IN TESTING 305 

material One of the chief \ alues of mpasurcment may ell be its motn ating 
clTect upon the foachor 

Taba carh realized this relationship and spoke regretfully of the “formi 
dablc and senous handicap” of the progressive schools due to the "lack of 
forms of testing thatarem harmony with their aims and adequate to their 
purposes ” Then she added these significant words * 

Afltr all one teaches only n hat one m some nay or another is able to e\ aluate 
as on outcome of tliat teaching ff we are unable to e\aluate the gro'i\th of in 
tcgrations and meanings and x\a%s of beha\ior, we are unable even to form an 
adequate notion of them still less to guide the process of learning in these terms 


\tttnlion should be called to the fact that Taba's recognition of the limi 
tations of CMsting measurement does not blind her to the important moti 
^ ating influcncp of measurement whether good or bad, and to the urgent 
necessity for rnntumoiis improvement of nieasunng instruments 
T here is, of course, soim danger that the content of the examination 
may exert too great an influence over the teaching emphasis and curricu 
him content It has often been alleged that something like this has bap 
penod m the ca'c of the New York Regents and the College Entrance 
Board examinations, where an important effect of such examinations has 
been to turn scrondary schools into "cramming” schools pointing toward 
the probable content of such examinations as revealed by past examma 
tions by the same agencies Insofar as this is true, it represents unfortunate 
and unwarranted control over teaching procedures, which not only defeats 
the purpose of the examination but places an obstacle m the way of educa 
tion Itself Such practice fails to take into account the sampling nature of 
all tests Any attempt to dnll pupils m advance specifically upon the items 
of the test tends to narrow teaching to the scope of the testing, so tJiat the 
iver-auie ^vmon vinous rather than one a random sampling of 
the ofL«’ Ihis should be cUarlv understood, and every reasonable pre- 
(atUmn sfiould bo taken to insure that state-wide testing programs and 
other b mis <l ncaflenue competition do not promote mere monotonous 
drill ev rmsps of an especially narrow and vicious type 

A few studies have been made of the effect of exempting pupils from 
fin il examinations An early study by Amlereoii* indicated that exempting 
hi di school pupils who readied a minimum standard from flnal examma 
tions had "played havoc with teachers' grades,” while the actual perform 
ance of the pupils had shown "no appreciable increase ” Apparently the 
motivation had been in the wrong place lhat such a disastrous result nee- 
not occur is indicated 1 y a subsequent study of the same problem repo— ‘ 
] 


* Hilda Taba The Dynamics oj I lueiii n pij? Ifs-t \»^\oTk Ifarcourl B**-” 
Company Jiic 19^2 

»C J Anderson Is tae £.xen pll jii HysU^u Vcrli ^cftool and 

357-360 March 4 191b 



306 


MEASUREMENT IN INSTRUCTION 


by White,* ^ho found that the genera! distribution of marks in his school 
had changed \ery little under the exemption system Even here, howe\er, 
was found a “decided dm m the distnbutions immediately below the ex- 
emption point of 85 per cent and a corresponding nse just above it " 
Schools employing an exemption system should remain constantly on guard 
lest Its effect be more to stimulate the teachers to gtie high marks than the 
pupils to earn them 


C- The Relation of Measurement to Motivation in Learning 

Close as is the relationship of measurement to the motivation of teach- 
ing, the relationship is even closer to the motivation of learning Elsewhere 
Ross stated the problem as follows * 

Behind the act of learning is the capacity to learn, and back of the capacity is 
the motue to learn— the desire, urge, impulse, dn\e, or something, that makes the 
creature uanl to learn, that pushes him out to meet his environment One of the 
reasons why most of the correlations of roeatal canacitj wnth actual achievement, 
m school and out, have been disappointing^ low, has been that students of real 
ability have not felt a proper urge to work, while those of mediocre talent have 
Irequentlj Dos«essed the urge to achieve 

If we are ev er to be successful in our efforts to predict achiev ement therefore, 
we must not be content wath merelj analj zing the learning process, understanding 
the mechanism of learning its structure and laws of operation nor vvath merelj 
exploring the height and range of human po^ibihtics, but we must al^o find out 
about the dj-namic aspects of human nature We must discov er not only hcno the 
mind works, but uhy it works when it does and the way it does 


The foregoing statement is an introduction to a report of an expenmental 
attack on one phase of motivation The stud> will be bnefly presented here 
as an illustration of one type of psychological experiment concerned wnth 
this important problem Afterwards some critical comments upon this and 
other similar experiments will be giv en 
An experiment in motivation. The problem was to determine the 
influence of a knowledge of results upon the achievement of 59 college 
students m a simple act of motor skill, making tally marks (ffW) The pro- 
cedure was as follows Upon the basis of an imtial practice period three 
equivalent groups were formed One of these, the control, had no Knowledge 
of its progress, throughout the first ten practice periods of one minute each 
During this tune one experimental group had full Imowledge of results, and 
the other had partial knowledge At the beginning of each practice period 
after the first, each pupil m the group with full knowledge was shown lus 
paper of the preceding day, wnth scores and corrections indicated A distn 
bution of scores for this group was placed on the board, and each student 


♦Clyde W 'l^e Effects of Exemptions from Semester Examinations on the 

Distribution of School Marks School Renew 39 293-299 April 1931 

‘Clay Campbell Ross ' ^&penment m Motivation,’ Journal of Educattmal Psjl 
chology 18 337-346 May, 1927 Page 337 ^ 



Mori I -.1 rios’ AS’D PiiAcrwi; m testing zm 

P'^sress, Iiofh relative and absolute In the 
toH v-rn T'"' of rosolts, each student was 

all At 'll D en'r 7T ^ ‘•'oa'-orage of the group, but that was 
whirl, I I r f" "T a conditions were reversed; the group 

Inch had had no knowledge of results was then given full knowledge for 
wo additional periods, and the other two groups were given no knowledge 
for those t«’o pcriorls. ® 

Tfio resiiUs nm ^hown in Figure 30 on page 206. On the whole, they 
seemed to justify the conrliwion that “the addition of a single other moti- 
vating factor, knowiedge of rc.siiHs, is sufficient to give the pupils with such 
fcnowiodgc a distinct superiority over the others, and the degree of supe- 
riority is rouglil.v proportional to tlic amount of information possessed." 

Limitations of experiments on motivation. The experiment just 
s\immarizc<l ilUistmtcs several of the weaknesses of the e.xperimental work 
so far reported on motivation and, for that matter, on other phases of 
learning aa well. There may be conveniently grouped under three headings: 
factors studied, subjects used, and conclusions drawn. 

In the ffrst plceo, the factors so far studied leave much to be desired. 
Most of the studies involved arc concerned with highly artificial and often 
trivial tasks. I'akc the above e.vperimcnt as an example. It is highly im- 
probable that students uill show much enthusiasm over making, as rapidly 
as possible, groups of four vertical linc-s crossed horizontally with a fifth. 
Tallying, to bo significant to most persons, would have to be employed as 
a record of some athletic contest or other situation in which it is a means 
to an end and not an end in itself. A survey of the literature reveals that 
a largo percentage of motivation studies have been concerned uitb making 
legible a*s, canceling numbers or letters, assigning a number to a dictated 
word, learning trisnal facts or actual misinformation, running mazes, and 
ihe like. we need to know is how people behave under actual school 
conditions. Even when arithmetic and other school materials are employed, 
the experiments are rarely carried on long enough for the novelty of the 
task to wear off and for the experimental factor to operate under reasonably 
normal conditions. The total learning time is frequently less than one hour, 
and often it is only ten to fifteen minutes, as in the above laboratojy experi- 
ment. To be most helpful in guiding teachers in the day-by-day conduct of 
their classes these e.xperimental factors must be continued for at least sev- 

In the second place, the choice of subjects for the experiment has usually 
been rather unfortunate. Many of the laboratoiy' studies of the effects of 
rewards and punishments have been limited entirely to animals. 
how thorough a believer one might be in evdnhon, one mue ^ 

that the behavior of rats in a maze or cats in a puzzle ^ 

be different in essential respects from that of school children facing the 



308 


MEASUREMrNT IN INSTRUCTION 


mtncacies of a foreign language or learning to manipulate the abstract 
symbols of algebra Even uhen human subjects have been used, as thej 
have been in many expenments, they have usually been adults, often stu 
dents of psychology Frequently, also, the number of subjects has been 
small with a poorly equated control group, or ^^lth none at all Tobemo'^t 
convancmg as a guide for school practice these expenments must be per 
formed i\ith children of approximately the same age and type as those 
found in the schools vhere the results are applied If our studies of cither 
maturation or learning are to be relied upon, the child is not merely a 
mimature adult And if the numerous studies of individual difTcrcnccs from 
the tune of Galton to the oresent have established any thing it is that uhat 
IS true of one person is not necessanly true of another Expenments based 
on a handful of subjects, therefore, must De accepted with considerable 
discount 

In the third place, the conclusions drawm have often gone far beyond 
the experimental facts available This is mainly the result of the two limi 
tations already mentioned If experimenters vsould be content to draw con 
elusions from, and to make applications to, the same or closely similar 
subjects and to the same or closely similar tasks, no harm would be done 
But this IS rarely the case To generalize from one age lev el to another is 
nsky even when the task is the same, but it is particularly liazardous when 
the activity itself is different Yet this very thing is commonly done 
A meaningless and often tnvnal act is perform^ by adults under the highly 
artificial conditions of the laboratory Then the results are applied without 
qualification to the meaningful learning of children under the actual con 
ditions of the schoolroom That this procedure is wholly unwarranted will 
appear from an experiment by the wnter to be reported later in the chapter 
But to generalize from the behavior of a rat m a psychological laboratory 
to that of a child m an ordinary classroom is little less than foolhardy 

Type«» of motivation experiments From what has just been said, 
it may appear that the writers believe all motivation expenments to be 
worthless Such is far from their conviction, how’ever While they believe 
diat practically all such expenments so far reported have certain weak 
nes«es to which attention should be called, they are convinced that genuine 
progress has been made and that the way has been opened for further 
studies to supplement those already m existence It has been definitely 
showm that the problem although difficult, ls susceptible to expenmental 
attack Furthermore, expenmental evidence already available is sufficient 
lo provrde at least tentative answers to two important questions 


1 What IS the relation of measurement to the amount and quality of learmng’ 

of measurement to the type of learning or to the learning 
■procedure followed by the student’ ^ 


The^e two question® will now be considered 



mouvaho\ amj mi( ncL ik tesiiao 


300 


I The Itchlwn of Veasunment to the Amount and 
Qualtltf of hzarnviQ 


'1 Iicrt IS con«i 
ll»c anioini/ and 
of factors 


donbfc CNpenmcntal mdenre rogardmg the influence upon 
quality of Icinijng of three measurement factors, or groups 


a Tlic frequency of the tests 

b The knowletlge Ihst a flml oTanunition aoufd be giren 
c 1 he knov\lefl;.o of results or prof're^s in looming 

Attenlion hia b«n gucn to the operation of these fjeton, individually, m 
combiimlion oitli carh other, and in eorabination mth other motivating 
factors sucli as praise and bhme rnalty, and various types of matenal 
rcuattfs Some of these findings \nff non be summanzed 
I n (]tH Iir\ of tests. I’nctict \ancs v\idcl> regarding the frequency of 
testing At one evtretnc arc the teachers nho ki\o no written examination 
of an> kind, and at the other extreme are those who gi\ c a tej-t of i,omc Jond 
c\cr> d \\ hat expenmentid evidenei there h> indicate the proper fre- 
<|utne\'’ '\fanjfc'«fh uhate\er ndiantage mai exist jn frequent testing 
rannat he nttnhuted solch to the motuuting eficcts, howexer, since the 
additional pnetice ifTordcsf {)\ taking the extra tests mu&t also be con 
sidcrcd 

The experimental ex idcncc regarding proper frequency of testing in social 
studies classes in iiigh school has not been very conxincing Hoglau* studied 
the frequency of testing in American history in one Iowa high school He 
found no significant differences among three groups, equated on the bases 
of intelligence and knowledge of history One group had daily tests another 
had three unannmmrotl tests per week and a third had only the regular 
tests at intcrx als of *=ix weeks In a similar study in the same subject Camp^ 
found a slight (but not a statislJcalJy significant) difference between one 
group tested oute or txxice a week and another tested once m two or three 
weeks In an earlier study in community cixics Shore® found no advantage 
m gix ing each day a true-false lest of ten items, but a group given an un 
announced test tw o or three times a xxcek did show a statistically significant 
superiority over a group given only the mid semester and the final tests 
On the XX hole, the exiclence appears to fax or slightly the practice of gixing 
a test once or twnce a week to classes m the social studies in high school 
Tw o experiments on the effects of frequent testing of high school biology 
students reported conflicting results Kitch’ found that the group taught 
w 1 th the aid of self scored unit tests did sigmficantly better than the group 


•Unpublished master a thesis University oflowa 1932 
» Unpublished ma«trr 8 thesis University of Iowa 1931 

1932 



310 


MEiSURHMrNT IN INSTIWCflON 


without such tests Gable” compared the merits of three procedures One 
group -nas told that it Tsould be tested each da>, another group that it 
would be gi\en announced unit tests, and a third group understood that 
It would be tested wnthout notice at irregular internals On the whole, the 
poorest record was bj the group taking dailj tests, but there was a tend- 
ency for the slower pupils to do better when a test was announced which 
ga\ e time for re\ lew 

Conner" found that the u«e of a well knowm «cncs of instructional tests 
m high cchool phjsics had not resulted in sufficient impro\ement in learn- 
ing to justifj the time expended Kugic" reported that short dailj tests 
m ph> sics resulted in pupils’ ha\ ing a small supenontj o\ cr tho-'C to w horn 
tests were given onlj at the ends of units Kirkpatnck" found a distinct 
advantage in the 26 high school phj«ics classes included in his stud>, in 
giving an objective test at the beginning of each unit As each unit covereil 
from one to three dajs, this meant that tests were given at least twice a 
week The pupils had defimte knowledge that the test would be given, that 
it would cover all the important concepts of the unit, and that the final 
examination would include onl> points included in the-'e unit tests The 
tests were corrected in cla«s and were used as a basis for class discussion and 
subsequent studj Both experimental and control groups took the same 
term tests at intervals of six weeks WTien the experimental groups were 
considered as a whole, a highlj significant statistical difference was found 
on a te«t of objective information given at the end of the course, but this 
supenontj hadlargelj dieappeared four months later The testing program 
was most beneficial to the pupils m the lowest third in mental abihtj This 
suggests that schools attempting to group pupils according to ability maj 
V erj well consider varjang the testing program as w ell as the curriculum 
and teaching methods 

Experiments involving the frequenej of testing have been most numer- 
ous and, on the whole, most convincing on the college level A serious lim- 
itation, however, is that thej have been largely restneted to classes m 
general and educational psychologj Jones,'* in a pioneer study, gave fiv e- 
minute completion tests euphomouslj called "terminal reviews,” at the 
end of each of 27 lectures in psjchology Eight weeks later the groups so 
tested made scores on a final examination that w ere approximately twice 
as high as those of group* who had had no “terminal reviews ” Another 


■■ Unpublished ma>ter » thesis University of Iowa 1932 
” Unpublahi^^ster 8 thesis Pennsj Ivania State College 1936 

ProeSS'" * J j' Motivating Etfect of a Specific Type of Testing 

® June 15 1934 

4 3“7VNTeinb^S““®‘“'^“”''^"'^'Teachm rlrchire, o/ PsyeW 



MOIIVAflON AND PRACTICE IN TESTING 


311 


stud} ** reports 'idNintagcs to be gimcd in using ^\eekly objective tests in 
general psj chologj 

Both Turnej ** and Kejs*’ found weekly tests m educational psjchology 
better than tests guen less frequently Turney found that when given 
wccklj tests a class which was well below another class at the beginning 
was able to equal the achievement of the other class which had only one 
short tost in addition to the mid semester and the final, which both groups 
had Kcjsfound that eight wcekl> tests gave an advantage over the same 
items given to equivalent groups in the form of two monthly tests How 
ever on an unannounced cvamiiiation covering the same matenal given 
r,^c «ccks I iter, this lidisiifigc had been reduced When the regular final 
exnminat.on eame after the additional t«o nceks the ■'f^vement of the 
cxpcrimonfil and control groups nas practicallj identical What ‘Xj effect 
of the ncekly testing nas after still larger intervals is unfortunately un 

''Tohnsoi.” eompared the effect of nntten unit tests and the effect of an 
1 »imnnnt of timc devotod to oral revaews wath 55 pairs of freshman 
equal d^ciopment She found that a statistically 

S;n.ncn:rdifforenco.nf.orof_.^^^^^^^^^^ 

;;fgre\tr aeluc'lent nhich has been tnduced by examinations persists 

after SIX ® ‘'I ‘''f ' ptlle or no adtantage in weekly tests even 

A fen at tbc end of the course For example, 

when compansons at the Umvensty of Minnesota gave 

iiecUy tests in goner P y^h ^ a,,ght aupe- 

nogative results B educational psychology Hon ever, 

nonty in Infrequent y d^^ ^^^j,a„al psychology, and Noll in 

Ross and Henrj in hot g . ^ jjiat the benefit of weekly tests was 

g"?o“rrettudSs°ot ion ability It .a evident that there is no one 

^ - , , , The Relation between Frequency of Teatins and 

»C C Rosa and L>le KHy journal of EduccUional Psychology 30 GOt-Cll 
Progress m Learning Psjcnoiogy 

November 1939 „«_torFrcqueDtShortObjectiveTestBupontheAchieve- 

»• Austin 11 Tumej ,^^®S.,,._»,oiialP8ycbology School and 5o«elj/ 33 7G0-7G2 
meat of College Students id Lducati 

June 6 1931 Learning and Retention of Weekly aa Opposed to 

iTNoelKe>8 ^^ f raj.eatumal PsycMo^V 25 427-430 September 1931 

Monthly Tests -r Wntten Examinations on Learning and on Rcten 

I* Bess E Johnson The L. „ Eduealum 7 55-02 September 1038 

lion of Learning and M Wilder Ths Effecltve College Curriculum as 

ti A C Eunch H * Mmneapolia University of Minnesota Press 

Ramled by Exammaltona pages 

1937 er,, Tret,.M nf W ntten Testa upon Achievement in College CUa^ 

*» Victor H Noll The Evidence Journal of Educatumal Research 32 

An Caper me. t and a Summary 

"uc c''ro“ and Lyle K Henry op al page. 009-610 



312 


MFASUREMJJNT IN l^SlliUCTION 


best te'^ting technique \\hich is equalh effective under all trmditions fesl 
mg methods as ^^ell as other teaching procedures must consider the abilit\ 
of the student as uell as the nature of the subject 

Kulp^ga'ie the students in a graduate chsMn cahirational soriulogj ^\ho 
were below the median on the inid semester evaminition a ueekU ten 
minute objectne test for the ne\t sc\en weeks Ihe students abo\e the 
median were excused from the«e ‘<!iort tests On the final examination 
“identical in all respects with the se\eu weckl> tests " the supenontv of 
the upper half was reduced considerably, probably due largely to practice 
and regression effects rather than to mtrea'sed motivation Pressey” re- 
ports an interesting v anation of this procedure as used by Smeltzer m edu 
cational psychology Both experimental and control classes were given 
weekly feats But in the expenmental tJass to whom the test was given on 
Ihurxlay of each v\eek, the papers were returnee! and discussed on Fnday 
Those who had made unsatisfactory «cores were tested again over the ‘^ame 
matenal after a brief review on Monday while the others were excused 
On the final examination the expenmental group was above the control 
group the advantage being largely with the pupils who were m the lowest 
fourth of the class and who had taken the retests 
Three of the above expenments attempted to get the students' attitude 
tow ard the frequent testing By me iiii, of unsigned questionnaires in three 
classes Jones found that 70 per cent of the students approv ed the “termi 
iidl revnew method " In like manner lurney di«cover^ an “excellent at 
titude toward frequent testing m his expenmental group about 83 per 
cent thought thev had studied more and over 90 per cent said that they 
preferred to be m that section and that they felt they had learned more 
even if thev had made no better grade From “an extensive questionnaire 
touching some thirtv issues of educational theory and practice,” given at 
the opening and repeated near the end of the semester, Ivey s found “With 
out comment by the instructor or knowledge of the experiment in progress 
students disclose a strong and growing conviction of the desirabihtv of tests 
given as frequently as every second third or fourth class session ” The 
evidence strongly suggests that students favor frequent testing 

Awareness of final examination To what extent is the “mtention to 
remember* or ‘ temporal set a factor ni learning^ Will the expectation 
that the roatenal will have to be recalled later influence the amount re- 
tained*’ More specifically how will the awareness of a final examination 
affect the progress of learmng and of forgetting*’ One or the other of the 
following ‘ two nval and mutually exclusive hypotheses ” as suggested by 
Remniers ** is apparently true 


” Daniel H Kulp II WceUy Tesla for 
38 157 15Q July 29 1933 

” S dnej h Pre«><iey Psycholo<jy and Ihe \ 
Harper A, Brothero 1933 


Graduate Students*’ School and Soaefy 
ew Ed cation page" 363-366 \ew'Vork 


n H tip. ni r. m 1 oth.r. Fx.n ptio tmm r,ll t.p qrmi~tpr I xaitp H.o ! 
; r( p I V /.«•! II, I r I ,| i r HI! 



VOTIX /I TION AND PRACTICE IN TESTIS & 313 

m^lLr , " rehfionshii,. to reT^on in term> of the subjeit 

nialtcr, to upplj tin" tn onmi, to =ign ficant proMem« eti- an i m general mof 
fn(*cti\p mil {xTnianrnt Icarmnn ^ 


Thrsfcd and Ktmmcr-^ Ktimmanzcd the literature on the general prob 
Itm, including studiCa on euih di‘:«imilar matenals as stones objects shomi, 
\ocabulan, nou'^ense 83 liable® photographs, and stvlus maze Tbev con 
eluded “It ® evident that a condition of etpectation of recall Then 
injected into the initial in®tnjclions has given variable and conflicting re 
suits ” Thoir oun stud3 , vshich included 404 ps3choIog3 students involved 
Icanung Vnglo*Saxon vocabular3 and the factual content of two article® 
presented in mimeograph(Kl form under ordmarj clas^-room conditions Ihe 
control group understood that They nere to be tested immediately, and 
the experimental groups understood that they uere to be te«ted later also 
after three da3a in some cases, after one week and after tno weeks irt other 
oa^« The exponmentv tended to C''tablish a somewhat dower drop in the 
forgetting curve when a set to prepare for delayed recall was introduced 
IMiilc the Iciming mitcnal m theaho\e experiments was not left in the 
liands of the student® it is rcasonablv «ure (hat one effect of the “temporal 
set” was to cause tho«t who expcclcil to have to recall the material later 
to give a “mental rev lew ’’ of w hat they could remember, as w ell as probabl} 
t.f' exchange idea® with other student* Under ordmarv 'school conditions 
or*- might expect that the effect of rciiemng for an cApected examination 
might be larger f fowev er, Itemmers'* later found that exempting students 
m mathematic® and applied mechanics nude rclativelv little difference 
m the amount, quahtv, or permanence of learning ai fea^t as cnca®iired 63 


current lvi>e« of tests and examinations 

Pease reported some mlerf^img studic® of the effect 01 cramming 
on the amount of cia-s matenals retained The first stndv included -Lveral 
classes— 111 all, 302 college afudcnls and 106 high school pupils— separated 
into equivalent group on the basi- of intelligence A te t of lOO oliiestiie 
itams nas prepared lor each class, ' eovermg several months of the u^al 
1 ourse nork already enmpletei! by Ibe students ’ At a meeting of the class 
the purpose of the expenment was dearly explained The O'Po"”™'" 
group m each class was then dismissed wi* loslrnetions spend a* leas 
L hour in review lor the examination which was to come at the next 
mieting of the elass The control group m each ilass took the cxaminatio 
-xTTThislca and It H Hcmmcr. ■^eEBecto! Temporal Se oe Lee, dor 
.car’ a r,; Apphri Fs«cUc^ 16 

:: L'n'a Tzi 

JpurppI of Fdpmhcnal Fspclwlm 2t /fS 277 April 



314 


MJASUlilMLNl IN INSlIiUCTlON 


at once 'Without warning or review The mean score of the cxpcnmental 
group exceeded that of the control group in each class, the average supo- 
nority being 11 1 points on the 100-item test Without warning the tc^t 
w IS repeated six weeks later The average lead of the expenmental group 
had been reduced to r>3 points But there was still a significant difference 
for all classes containing as manj as fifteen pairs of pupils Ilowev or, when 
one of these classes was retested after an additional six weeks, the lead of 
the expenmental group was reducLd !>> about half iiid was not then sig 
nificant After twelve weeks the lead in another class had been reduced 
from 17 07 points to 2 7 points which was not a significant difference 
These results had been prodiicerl In an amount of “cramming” h^ the 
expenmental groups that rcprc'-enied about one and one half hours, on the 
average It appeared probable that the time so spent > iclded returns that 
averaged higher than the same amount of time spent cither m class attend- 
ance or m regular preparation outside Pease concludes that “from the 
standpoint of the student, it pajs to < ram ” 

Tjler and Chalmers” studied the effect on lest results of warning junior- 
high school pupils that they would have i unit tist in general science on 
the following day The test scores of pupiK •»o vNariiid were compared with 
comparable pupils who had no specilM wanimg although tlicy were all 
aware that it was customary to have a at tlie (iid of each unit, usuallj 
with the time announced at least two dav s m inlv ance All of the obtained 
differences favored the warned groups but b\ margins below the level of 
statistical significance Six weeks later when the tests were repeated, the 
differences had practically disappeared I he authors questioned whether 
junior high school pupils are really motivated to studj for unit tests even 
when announced, or know how to study effectively when they try To be 
effective motivation has to be intelligently directed 
MTiite” conducted an experiment that bears directly upon the effect of 
exemption from a final examination Three classes m general psychology 
which met once a week for seventeen weeks wcic divided, according to 
chance into experimental and control groups At each weekly class meeting 
both groups were given a “comprehensive mimeographed true-false test 
covering the chapters studitd for the penod ” From the outset tlie control 
groups understood that their marks in the coiiiso would be based solely 
upon these weekly tests while the exptnmental groups understood that 
they were to have a final examination that would count 50 per cent toward 
their course marks At the class meeting following each test the corrected 
papers were returned to all students, and they were allowed to keep the 
papers Th e experimental groups were urged to preserve these test papers 


?nnTi^’®r Clmlmera The rffcct on Scores of Warning Jui lor High 

Sr ^ ^ ^ of tearcA 37 2 10-2 JO Decern 

a," 1 " . ' 1 ^' « ' 



31.) 


MOTIVatIOS .lA’D PRACTWi: IX TESTING 

‘1“ "™w ™nfain oxaollv the same 

t , 1 nfi' I “ ""''N T ‘hi! final examination waa given 

Iro na e^ ‘ha atndente ,n the control 

dZr„ <la‘a™.nc the value of the expt riment Tlio 

difritento ],C u wi, tlie gm„p, nas 51 2 per cent, the experimental group 
m\ ifiR gained M G por cent and the pontrol group hav mg lost 19 C per rent 
A-cn more ron\ incmg uas, the equal supenonty of tlie experimental group 
on a compleljou test “»wih nhieh they iicre uholly unfamiliar " 
Kno^lrdcc of test scores. What is the effect upon the course of loarn- 
ing of tlic knon ledgi* of progress, afforded hy test scoreb or by other means’ 
The aiisner to this quebtjon has been sought many times m the psjcholog- 
ie.al Inbomtorj’, inth practualJi unanimous remits Psj chologists arc in 
agreement nith the conclusion of an early study*" that “the 
ndditioi! of a single otlicr motivating factor, namely, knowledge of results, 
is sufiirient to gne tJie pupils mih such knonledge a distinct fiupenont 3 
over the others, and the degree of supenonty is roughly proportional to the 
amount of information possessed " However, as has been pointed out earber 
m the chapter, expenments conducted m the classroom are far more con- 
vincing Wc shall non taken look at what tbc\ haveshomi 


One of the earliest and most comprehensive of these studies conducted 
iiniler actual schoolroom conditions was that of Panlasigui The findings 
non* bnH*<l on 358 pairs of pupils m fourth-grade arithmetic in ten cities 
The pnicticv matrnai ron-^i'-tcd of fifteen minutes' dull in examples of the 
mi\o<l typo of fundamentals once a week for twenty weeks As all pupils 
scored their papers after each drill, it can be seen that each pupil knew his 
ofAiricmcn/ /or the day, although this knowledge must be related to previous 
records in orrler to ho a I noulcdgc of progress, strictly speaking In the e\- 
ponmcntal clashes the idea of progress was stressed, progress charts for 
both the indiMilual and the class being kept m a conspicuous place The 
teachers of the control classes, on the other hand, w ere instructed as follows 
“Please keep very much out of class discussion any reference about how 
much pupils are scoring ” The companion appeared to be, then, between 
experimental classes with somewhat more stre^ on progress, and control 
cla*'scs wath somewhat less btress on progress, than is customary to the 
ordinary teacher On a comprehensive test the mean of the experimental 
< lasses exceeded that of the control classes by U 34 A detailed exaimnation 
of the results reveals the fact that this supenonty is most m e^^dc^ce m 
the liighest quarter and practically non-exiatenl m the lovcest quarter The 
Iwnehcial effect of awareness of success, then, was substantiallj in direct 
proportion to the amount of success available for motn atioii This is also 


o iTdoTpaSl rl'mct .!f bC'.cZ.t 

OU-CM BloomuiBton, Illinois !«li-«.l tbihli-luns ‘-"'“P*''-* 



31G 


MLASVRhUENT IN INSTRUCTION 


true of the drill penods themselves, where the accuracy standards of the 
highest fourth of the experimental group exceeded those of their controls 
eight times out of eleven, whereas the lowest fourth of the expenmental 
groups fell behind their controls on c\ery dnll This expenment seems to 
have established rather definitely two important points 

1 A knowledge of progre«s m learning under classroom conditions is likely to 
ha\e much less effect than that under hboratorj conditions 

2 A knowledge of orogrc'ts is hkclj to be more beneficial to good student® than 
to poor 


Studies since reported ha\e confirmed the first point, but most of them 
have not been analyzed wuth respect to the second point 

Forlano*’ conducted a comprehensu c senes of experiments in grades 
four to eight, inclusive, in^ohung in all 1,294 pupils, ami touching upon 
vanous aspects of the nroblem of the effect on learning of a knowledge of 
results The expenmenter emphasizes the fact tliat these studies w ere made 
“in the normal classroom situation as far as possible as a part of the 
daily school routine ” He attempted to determine whether giving a knowl- 
edge of results immediately after the w ord had been spelled or an anthmetic 
fact had been studied was more cffecti\e than when a knowledge of results 
was withheld until an entire column of 20 or 24 items had been attempted 
In other words, if one may use the analogy of the target range, Forlano 
was interested m finding out, so to speak, whether it was better to tell the 
marksman his score after each shot or to wait until he had fired a senes 
of 20 shots The author’s conclusion is as follows ” 

The results of our expenments show that there is a tendency for learning during 
which the learner ostensiblj recei\es immediate knowledge of results to be less 
efficient than learmng m which knowledge of results is delajed In general, it ma> 
be said that this supenonty of the “delajed knowledge of results” method does not 
alwaj 8 approach statistical certamty 


Even this modest conclusion, the authoi suggests, is limited by the fact 
that the methods employed “may not be ‘cure’ methods of nhat they pur- 
port to involve,” and that the apparent supenonty of the delajed pro- 
cedure may he due to other causes In any event, since the penod of delay 
neier exceeded five mmutes, little light is shed upon the ordinary school 
situation, nhere the tests follow leammg after an interval ranging from a 
day to a year or longer 

Brown" reports an expenment in anthmetic m grades 5A and 7A Both 
his procedure and his conclusions differed somewhat from those of Pan- 


Sj 1936 Bureau of Publications Teachers College, Columbia Un. 

” Ibid page 99 



\ioT/vvrioh AMyninjci m iisti\g in 

Ks,gm In Rrnic 7A, Rrm™ wIccM In- cxpenmental and ,„„irol grades 
«i. ch «cre on } roiigiilj cqunalent, on the hws of an mtelligenee Lt 
and m grade , \ on the b^,s ot estimate., of n.teihgenre and aehieiemcnt’ 
Tlic Btoiips Mere rciersed at the end of the firat period of ten dajs The 
dnll period mts eight to ten minute, dadj ttTnle the differences on the 
nho!o, faiored the etpemnenfal group, they Mere not very impressive 
An cxutnmntion of the individual dnll penods reveals the fact that the 
progress from d i\ to d'i> in all groups was irregular and somewhat mcon 
-JStPiit, and that the ditTcrciices hclwceii cvpenmental and control groups 
were genenni less on the tenth dnv than on the first There was some 
evidence in Brown's stud> that the incentive was somewhat more effectne 
with I)ojs than with girls but the outstanding fact was the remarkably 
small amount of mflupnee taken from any point of view, of a knowledge 
of progress in (he classroom as compared with the laboratorj 

Deput} “ conducted in a state university a carefully planned experiment 
with three groups of students, of approximately equal intelligence m fresh 
man philosophv, which met twice a week For six weeks during the first 
half of the semester the first ten minutes of each class meeting of the control 
group were devoted to an oral review of the preceding lesson One of the 
experimental groups had a ten minute objective test covenng the same 
mitcnal, and the other cvpenmcntnl group bad the some items m a twenty 
mmuto test given once a week Beginning at the middle of the semester 
the group which liad •‘Cned as a control was given the ten minute test at 
each claas meeting while (he other two groups had only the oral reviews 
The scores for the experimental groups were put on the board following 
each test, and each student was iirg^ to keep s record of his progress 
Only one of (he tlirec compansoiis between the ten minute written test and 
the ten-miiuite oral review showed the former to be superior by a statis- 
tically significant amount This fact the author ascribed to a particularly 
favorable attitude on the part of the students The experimental group 
which excelled happened to be slightly the most intelligent of the three 
and also showed itstlf supenor to the group which took the twenty minute 
test once a week Deputy s most significant conclusion was Considerable 
precaution should he taken m applynng principles derived from laboratory 
and other non clasi^room situations to work in school subjects 

Two years hter Ross” began a senes of experiments which were to torce 
him to this same conclusion Attention has already been called to the 
earlier laboratory expenment" Mhich had appeared eomoaoiag aot only 
to the author at the time but also to many readei's since that time judging 


. E C Deputy 10 ot Sacra, a= “ ■““‘T'Tou'f "" 

■'zfgZTrtfrzLfnZ “ Sr 

Jo o,alojrd.colonal P»»d.oi»w 2i «X> eiO November lUJi 

* S < >0011 oto 5 



•518 MEASUnEMri\T m INSTRbCTION 

from the \sTiters on educational pajchologj who have quoted it with ap- 
pro\ al 

Upon the bans of a comprehensnc examination given at the end of the 
first unit m a class in tests and measurements, a large class was dindcd 
into four substantiallj equnalent groups A regular class test was given to 
all students once a week for the next two months At the next cla^^s meeting 
following a test, a distribution for the entire ch*5s was put on the black- 
board and a bnef diacussion given of each item mi^^cd bj anj considerable 
number But the four groups were given different degrees of information 
as to progress One group was given no ImovAedgc whatsoever as to its 
''cores \ «econd group was given tague knoxrledge, each student being told 
merely that his score was "good," "fair," or "poor " A third group was 
given 'parlial J-notcledge, each student being told his point score but not 
allowed to see his paper The fourth group was given full hnoxcledge, each 
student being shown his paper at the clo e of tlic class and allowed to ask 
anj questions he wished to a«k regarding it 

Figure 38 «hows the results for the four groups in the form of cumulative 
"Cores, week by week, for the first eight weeks and for the last four weeks, 




mrn atiok and PUAcrtcr w testing 319 

I, ‘"O Ollier chvvs m the aame subject and mth one m a different 
ubjcct i\ol content iiith this Rois persuaded a colleague m another 

tlwir-ri l'° "1 u “““ ''P'"™""* ll'e groups involved more 

than oO tests and about 300 students and not once did there appear a 
dilTercnce favoring the group nitl. full Inouledgc of progress that meets 
the minimum reijiiircrocnt for stjtistical significance 
Two conclusions seem reasoiiabl} certain The first directly in line with 
tliit of Depuh IS as follows ” 


Tlie Gc^tilt of tie libonton siUntWH is so different from that of the life 
‘iituation outride tint it is hazardous to generalize from one to the other One 
can nc\cr bo certain nhat the outcome of a laboratory expenment will be when 
appiietl to the classroom situation until it has actuallj been tried out m that 
•iitualion 


TIic second conclusion is that most if not all experiments relating to 
knowfctfgc of results in learning fia\e mvofxeci another erroneous assump 
tion namclj, that bccau«c students were not told their individual scores 
tlicj “had no knowledge of progress Certainly they had their subjecti\e 
impressions To test out the accuraej of these impressions the author re 
quested the students in the no knowledge group to estimate the scores 
tiiot thought the^ had made wJien the} turned in their papers at the close 
of the tests The median cocflioicnt of correlation between these estimates 
and the actual scores was 71 Manifestly then such studies involve a 
companson of licohnds o/hiotilcdgc subjccti\e and objective Moreover 
there was a tondenc} for the poorer students to overestimate their scores 
In such cases the illusion of success may very well have proved more stim 


ulatmg than the rcalit} of failure 

Knowledge of results combined with other incentives It is prob 
abl} rare that a knowledge of progress operates alone It is likely that 
such factors as nvalry and social recognition are always involved in some 
degree But m the expenments so far reported these other factors were not 
emphasized In many experiments however the knowledge of progress has 
merely been taken as the occasion for utilizing other motives such as praise 
and blame nvaJr} raone} or other rewards and the like 
At least two studies have attempted to use a knowledge of intelligence 
scores as an occasion for verbal suggestion and other forms of motivation 
Mitchell” divided the lowest fourth of the freshman class m a high school 
into two equivalent groups on the basis of the Otis tests Each pupil in the 
experimental group received the followmg notice without further com 


"^IFcTlos. A Needed Emphasis m Psychological Eeicard. ftyrWoy.a.1 

cLThSer" W Do Pup Is FaiP /»««.r Wr SM CUann, H«,. 

9 172 176 November 1934 



320 


MEASiiii:Mi:\r /v jksjrlction 


Dear Pupil 

Your srorc on the IiiicllijicnL*- Tcnt whieh wv c»ven at the ojicnuig of school 
IS LOW ITiis mil mean that much work mil rfTori on \ otir part will be necessar\ 
to keep up mth the class Put 5 ourself to Hit task nml shov, thit jou c.in do >t 
YOUC\N It YOU WWl 

Principal 

At the end of the year it was found that 62 per cent of the group winch had 
recened this notice passed on all subjects, while only 15 per cent of the 
equallj poor group, which had not been notified, did so 
Ro-sS'*'’ coiuluLtcd a somewhat similar stud^ with college students at the 
Lnuemtj of Kentucky IVom the lowest fifth in intelligence, evpenmental 
and control groups of 40 freshman each were formed upon the basis of 
psj chological tests, sex, and fraternity affiliation The students m the ex- 
perimental group were then called together, and a frank statement was 
made regarding their stores They were told that it was imporbint at the 
outset to recogmze the fact that they were up against u somewhat different 
situation from that of the students with liiglier lest stores They were as- 


TABI.r. .C} 

Point Stavdino for the First and SrvoND Spmesttks for Jx)w-Rankino 
PRESH iiEN Who Were Told Their Istllugencl Test Scores 
AS COMPKRFD Wmi ThOSE W HO W ERL NOT 


Point 

Standing 

Firit Semester j 

SicoND Semester 

Arts A j 

Commerce | 

lotai 

Aria (t Sex 1 

Commerce j 

Total 

Exp • 1 

Con t 

Ex|i 1 

Con 

Exp 

Con 

Exp 

Con 

Exp 

Con 

Exp 

jCoii 

1 SO-1 99 

1 60-1 79 

1 40-1 a9 

1 20-1 39 

1 00-1 19 
NO- 99 1 
W)- 79 ' 

40- o9 

20- 39 
00- 19 

3 

4 

1 7 

4 

3 

1 

1 

1 

2 

2 

a 

0 

3 

3 

2 

1 

1 

1 

2 

1 

3 

7 

2 

2 

3 

2 

7 

1 

3 

9 

3 

7 

8 

2 

1 

1 I 

2 

2 

7 

8 

5 

10 

4 

2 

1 

2 

4 

1 

3 

3 

1 

3 

3 

1 

2 

6 

3 

1 

4 

1 

1 

3 

1 

1 

2 

1 

1 

1 

6 

3 

, 1 

: 2 1 

0 

5 

2 

8 

5 

1 

4 

3 

1 

7 

4 

7 

7 

3 

Total 

24 

24 

16 

16 

40 

40 

20 

21 




44 



78 

83 

45 

94 

61 

85 

88 




G9 



41 

46 

2o 

41 

39 

50 

49 

4o 

22 

4o 

46 

Me-Mc 

1 20 

1 38 

1 30 


03 


12 

16 


• ExpenmcDtal group 
t Control group 


Freshmen Be Told Their Scores , 
lntelli,rPO(f le«t« Sebooi and 47 67S-«>S0 Ma\ 21 103$ 



MOI IVATION AKD PRACTICE 


IN testing 


had Theco„Wg™.p 

The record of the UoKTOups IS summarized m Table 33 Themean point 

Mandmg of tl.c ovpcnmcntal group ua.. 94 for fho first semester and 85 for 
tlio second scmeslcr, nhile tlic eorocsponding laJues for the control group 
"ere fr! and C9 rospcclnclj During (he first semester three times as 
many students m tlio cxponmontal group as in the control group made a 
point standing of 1 00 or better, and more than twice as man> made this 
standing the second semester ApprovimateJj tmee as man> evpenmeiitai 
ns control students passed a« subjects On the whole the difference was 
more tnarhed for the first semester than for the second, and was decidedl> 
greater for the College of Commerce than for the College of Arts and 
Sciences TIic'jo tw o studies offer rather convmemg evidence that knowledge 
of intclhgcnce test results maj have a motivating effect on low-ranking 
freshmen in high school and college More recent studies have tended to 
confinn tlic«o findings 

A great many more ntudies have utilized achievement test scores as oc- 
rasjons for various tjprs of motivation Jn an early siwdy Book and A^or 
V ell used a know ledge of results in four laboratory experiments is a basis 
for building morale or dev eloping the "will to learn " For example students 
in the expcnmcntal groups "were frequently told that if they would only 
make up their minds to increase their score they w ould somehow find a way 
to do It," while at the same time the "method of measuring their output 
and having them keep track of their score usually convinced them that this 
was true " llicir data support the conclusion that this "bpecial group of 
incentives" help the experimental group to "make more improvement with 
a giv cn amount of practice than do the control groups " But it is impossible 
to tell just how important a knowledge of results by itself would have been 
An experiment by Ilurlock/* winch utilized test results as occasions for 
praise and reproof, attracted considerable attention The subjects were 
lOG children in fourtli- and sixth grade anthmetic The groups were equated 
on an initial practice period of fifteen minutes Four more practice periods 
were held on successive days 3 he control group received the tests without 
comment The praised group had their names read aloud at the beginning 
of cich prnctlre penod They were (hen caHed to the front of the room and 
♦» Cf U K Cemptoa Student Evaluation of Knowing College Aptitude Te"! Score 
Icinuil p/EdiicaboncI P,ydwU„y 32 

AdminMoy and Sntar 

''"“2”otS'irrNoroeB 

.n Schoo. (Tort 

Journal Of Educahcmal Psychology 16 March 1925 



322 


MnASUUmtnST in jnsi ruction 


recciv ed praise combined \\ itli exliorljtioii to do still better ork. T hen the 
names of tlie children in the rcpro\ed group were called, and they were 
severely reproved for poor work, carelessness, and genera! infenonty The 
Ignored group heard what was said to the others, but they received no 
recognition whatsoever. The results arc shown m Figure 39 After the first 



day reproof seemed far less effective than praise, although somew hat better 
than being ignored altogether The control group made no progress what- 
soever It IS to be regretted that this experiment was not continued for 
several days longer Manifestly an hour’s total working time is insufficient 
to establish fully the comparative merits of these incentives as thej would 
operate day after day m the ordinary classroom 
In a somewhat similar experiment m the same grades, Hurlock« studied 
the effect of group nv'alry on addition The control group took their tests 
for ten minutes on four days without comment The experimental group 
was di^ded into two eqmvalent subgroups which were pitted against each 
other The author emphasized the fact every day that the two groups “w ere 
absolutely equal, and that one had as much chance to win as the other ” 
Although the effect of nvalry was present in all types of pupils, it was most 
mar in younger pupils and m inferior pupils Increase in accuracy was 
increase m speed, with some tendency for increase in speed 





MOTIVATION AND PRACTICE IN 7E&TING gjj 

to be ijccompanicd b> roduetion of accurac3 It ,rell to keep m nund 
Ihonuliko s ^^^mlng that “the attainment of active rather than passive 
learning at the cost of practice m error may often be a liaci bargain ’ « 
Another stud} ** shows that repeated applications of jiraise or blame may 
ha\c different effects on intro\ tried and extroverteu pupils Introverted 
fifth grade pupils improved faster m number cancellation exerei^i^ when 
pmsod than did cither intro\crts who were blamed or extroverts who were 
rm=ed Howe\cr, c\trQ\crtc<I pupils when blamed improved faster than 
extroverts wlio were praised or introverts who were blamed Unfortunately 
one cannot safely conclude from a stud> which involved a total practice 
lime of tlircc minutes upon highli artificial tasks that the same differcntes 
would ncccssanli appear under ordinary school conditions The problem 
IS wortlij of further experimentation 


II The Relation of Measurement to the Type of Learning 


CIosclj related to the amount and quality of learning is the ti/pe of learn 
ing or the Imming procedure which is employed There is considerable 
evidence for thinking that effective work or study habits of the student are 
of fundamental importance m Jeammg A question of major importance 
therefore, is to what extent does the tj^pe of measurement used iniJuence 
the tjpc of studj technique cmpIo>ed by the student^ Some unportant 
studies bearing on this question have been conducted on the college level 
In a p oncer study, Terry*’ found that 236 students m educational psy 
( hology were “influenced to a significant axtent by the t>pp of examination 
foi winch tlitj were preparing The most striking characteristic of the 
methods employed in preparing for an objective test which had been an 
nounced a month in advance was the students' emphasis on details while 
they tended to study for large units of subject matter when they were 
fireparing for an essay examination announced for the next month Dnug 
lass and Tallraadgc” reported similar results at the University of Mm 
nesota The> found that the ‘ objective type focuses attention upon details 
and exact wording while the subjective tyve apparently favors methods 
involving orgamzation, perceiving relationships and trends and personal 


reactions ” r 

There also appear to be agmlicant differences among the vanous forms 
of the so-called ne« type exammahons m their effect on study methods 


..E<l«ardL Thor„d,te and other, 

New York D Appfeton ^ The Effect of Repeated Praise 

“'TaunT “ w Vw sSn..“lL«e«. for Ob,ccl.ve and E«ay Te,t. 

<?/•; 001 Journal 33 592 603 April University Students Prepare 



324 


MEASUREMENT IN INSTRUCTION 


Terry" found for example, that the one predominant method of preparing 
for completion tests emphasized the word for word mastery of statements 
considered important wlnlc preparing for true-false tests involved methods 
which dealt primarily with definitions and detailed facts such as the authors 
and findings of expenments The author’s conclusion points nut an impor 
tiiil educational implication 

Tie kml of tc-’t to be gi\en if the students knew it in advance determine^ m 
large measure both what and how thej studj Tlie behnvaor of students in this 
habitual waj places greater powers m Ibc leachcrs hands than manj realize B> 
the selection of suitable tjTies of tests the teacher can cause large numbers of his 
studojt^ to study to a considerable extent at least in the waj s he deems best for a 
gi\ D unit cf subiect matter 

Mivcr' conducUd a careful laboratorv experiment with 124 psychology 
students to determine the relation botw een the specific examination set and 
immediate memory and delayed memory after five v\ eeks WTicn the amount 
of study was held constant the method and results appeared to be largely 
dependent on whether the set was for recall or for recognition te«ts It ap- 
peared that when students expected completion tests they studied w^th 
more effort than they would have put forth for recognition tests More 
students made summanes and maps and otherwise attempted to obtain a 
general picture of the material when they expected e»say examinations than 
otherwise Meyer points out four practical implications 

1 Since it 13 more economical when a given amount of time is spent m studi 
mg to use a recall examination set for delajed recognition or immediate and 
delajed recall to recognition questions should be used m testing onb "ben 
the> form a part of the entire examination or when students arc unaware that 
such questions am to be used cxclusivcij 

2 If the teacher feels it ncces’iar} that the students be able to recognize certain 
materials for a short time only then the indications are that a recognition exami 
nation set may be used This means that the teacher must evaluate the material 
m his course V ery carefullj since recognition tests if given indiscriminately ma> 
have a deleterious effect on what the students uHimatelj retam of the course 

3 If tl e teacher feels it necesoarj that tie students be able to recall isoLated 
facts wl PD specifi'’ ruf’s sip given as to the fact wanted a completion examination 
set ma> be used with profit 

4 If the teacher wants the students to recall the material m an organized fashion 
and to know facts when cues are not given the essaj examination set should be 
used in preference to anj objective type of exammation set Here agam the teacher 
must evaluate the material which he presents in the light of what the student 
si ould learn frnji the cou se 


‘‘FailW TCTiy How SludentsStudyfor Three TypesofObiectiveTe«ts Journal 
of Eau alional Research 27 333-343 January 1934 

George Meyer An Experimental Study of the Old and New Types of Examine 
tion Journal of Educational Psychology 25 641-661 December 1934 and 26 36-40 
January 1935 



MOTIYA'JIOX AXD P/tACTICL m TESTING 325 

The following quolHion from Monroe*' suggests that the nsf„re „r 11, 
OMira, nations ciiiplnsirod bj the teachem ' 

actions miicli more than the objcclii cs of the course 

obrebTersa r" ''"TT "" “f tochers formulating their 

I jccincs nnii m nsiionso to the pressure of autliontj, thev haie snent mam 
houre 111 fonnuht.tiR lists of immediate objectnes, that is the goals toward which 

tioii, hut their mlluoncc upon students is practicall} ml m comparison with the 
uilluence of tlie tests admmistereil Students direct their efforts toward becominn 
Able to re-pond to the tests thev anticipate ^ 


I)» Some Educational Iinplicntioiis of Alotivalion Studies 
Much of the cxpcnmontal c\ideaec on motu ation has been fragmentary, 
■somo of It contradictory , and hardly any of it conclusive But a few gen- 
mhzations appear to liaie been fairly nell established 

ImphcatioMH for crliicnlional theory , In the first place, there is gra\ e 
rhngor of premature and unwarranted generalizations m psychology and 
education I hat it is iiazardous to generalize from the laboratory experi- 
ment to the clnstroom application has been demonstrated in motn ation 
c\ponmcnta ngiin and ogam It is also dangerous to generalize from one 
age lex cl to anotlicr Tins is one of the greatest limitations of much of the 
expenmciital work, on motix ation There is a great need for comparing 
the results of expenments made on the college level with results obtained 
from pupils on tlic elementary and secondary school levels 
III llie second place, tlierc are no fixed motivating categories such as 
know ledge of results, praro and blame, rewards and punishments et cetera 
Brenner states this point well ” 

Tlie truth «cems to be that there do not exist such psychological entities but 
that they do act m spcctj^c ii/uafu>ns depending upon all thefadors oj the situation 
asavhole XHiat m one -situation may constitute praise under certain other cjrcum 
«tanco^ will he considered blame Tlie incentnes denve their attributes, so to 
=pe ik, from the situation in which they are actue 

Implications for educational practice Three points require bnef 
mention In the first place the nieasufxjment program of the school mflu 
ences both the teacher and the learner It affects teaching emphasis and 
tumculiim content os well as the amount and quality of ieammg and the 
procedure employed In the second place, no motiiating factor operates 


1 . iw o A* - CnmP Trends in Etlucstiona] MeKurement Twenty Fourth 
“ Walter S page 32 Bulletin of the School of 

Annual Coherence on ^ 4 Bloomington Indiana Bureau of Co- 

Education Indiana University Voi Aiii t* « t> 

operative Research ^^^7 j,j,mtdialeand Delayed Praise and Blame upon Learning 

Sf:'/„i"Bure.uofPub..e...ons Teachers Cottage Colu^b.. 

U/iiversity 



326 


MEASUREMENT IN INSTRUCTION 


uni\ersallj Both Chase and HurlocK, for example, found joung children 
more susceptible than older children to the motuation used In general, 
praise seems more effectixc with the duller and 60 Ciall> infenor groups 
Frequent testing also seems most helpful to weaker pupils On the other 
hand, there is some exadence that blame and knowledge of results are more 
effectue in the stronger groups E\en in similar age and social groups, 
howexer, marked indiiidual differences appear as to the relative effective- 
ness of different tj pes of motiv es, or cv en as to the cffcctiv eness of the same 
motive u^ed at different times Brenner** warns against a 

stereot3'ped habit of motiv ation for instance ahvajs praising the children ahvajs 
smiling and appearing pleaded Tins form of mechanized motiv ation is not adequate 
for increasing the performance of children, and it is doubtless harmful in its in 
fiuence upon character building m children 

In the third place, no motivating factor operates automaticallj Test 
scores, at best, merely provide an occa«:ion for praise or blame, row ard or 
pumshment, or «Qme form, of «ocial recognition The strategic place of the 
teacher is nowhere more in evidence than m motivation In a fundamental 
sense, the role of the teacher is to stimulate and guide the learning process 
Perhaps Brenner’s concluding statement** docs not put the matter too 
strongly 

The facts about the u«efulne«s of a motive m a certain learning situation will be 
furnished bj educational psjchologj but proper application of the incentive in a 
giv en situation depends upon the in«ight of the teacher The effectiv eness or w orth 
of a teacher depends upon his abihtj to make adequate U'e of motivation 


E Practice Effect 


The whole question of practice is mlimatelj related to learning in gen- 
eral One special aspect, practice effect on repeated tests, has received con- 
siderable attention Several standardized tests, such as the American 
Council on Education Psj chological Examination,** contain short pretests 
which help the examinee ‘ warm up” and acquire the proper set for the 
subtest that follows Some examiners prefer to preface a testing session wuth 
easy practice matenal to * cushion” inexperienced testees and therebj put 
them more nearly on an equal footing with "test^wise” students ** Ev en the 
effects of coaching on highly similar matenal raaj be ov erestimated, how 
ever as Djer*^ has demonstrated quite well with preparatory school stu 
dents 


** Ibid page 50 
** Ibtd page oO 

^ college-freshman forma are published by the Cooperative 

Test Division of Educational Testing Service 

“ For a surpn.^g by product of this rrocedure see Scar™ B Anderson 'Trediction 

»/ clpphtd Pai/cJiolosv 37 2ofc-2o9 

‘’Henry SDjer Docs Coaching Help? CoU^c Board Emeu, ^o 19 331 3Jo 
Februarj 19o3 



MOTIV \TION ANT) PliACTICE IN TESTING 


327 


Xc\crthclcss, it is undoubtedly true that the individual who takes a 
standardized examination for the first time in competition with expenenced 
examinees is handicapped, ospcciallj if the test mx oh es speed, complexity, 
and novelty 


Sflfcted Rlferincis ron FuRTncR Reading 

Cane, V R , and Ilcim, Alice ^\ ‘ Tlie rffccis of Repeated Rete-^tmg III Further 
I xpenments and General Conclusion*! QuarterUj Journal of Experimental Psj 
chology 2 182-197, Noxember 1950 

Cook ^\ alter W "The Function-! of Measurement in the Facilitation of Learning ” 
Chapter 1 in K F I mdqui^t (lAlitor) Educational Measurement Washington 
1) C Amencan Council on rducation 19)1 
Current Theory and Research in Motuahon-a Symposium Lincoln Univemity of 
hehraska Prp<is, I9o3 193 pages Articles and comments bj J^dson S Broiro 
F 0 Ilobirt \Io«rer, Theodore M ^eocomb, Vmceot Nowlis, 

„d::r;i,'rLTf,T„d nh..c.i 

" cSo Uo.re.,tyorCh.ea.oPres„ 

Moelthon, Donald « , ' ract and Fonej ,n Pcrsonabl, Research,- Amencan 

P,j,Mo!o^r(,S ^f’"' „entot Human Mot.vat.on An Exper.- 

McClcILand, Dand C , The Me ^ ,g^s Innlahmal Conference 

mental Approao P1S« f “ewTrej EducP‘'P"'>l Tcst.ng Semes, 1953 
on TcUng Mlcm, Pnneoton New Jer 

MoGcoch, John A, nnd In™ 'W /companj,1952 Chapter VI, 

(Second Ed.t.on) ‘''«2°:,\wTln5nt.>e Condrt.onB ’ 

“Lcam.ng as a Funct.on ,|,„mnt.cs The Learning of Mathematics, Its 

^at.onal Counc.l of p “ Icarbook Wash.ngton, D C The Counc.1, 

Theory and Practice, Tnc..t> f"! . ji„tivaUon for Educatron m Mathe- 

1953 Chapter Ilbj Maur.ee L Dr.ll-Praet.ce-Eecurnng Ex- 

mat.es,” and Chapter 1 1 oj u 

pcncnce” Effects m Inteihgenee Tests,' Brrlrsh /oumol o/ 

Pcel,E A , “A ^ote on Practice 

Educational Psychology . . -ruree Consecutive Tests of Intelligence,” 

reel, E A , ‘ Practice Effects y , 22 19G-199, November, 1952 

British Journal of Fducaiona * of Devices Proxiding Immediate 

Pressey, S L, “Development an Concomitant Self Instruction,’ 

Automatic Sconng April 1950 

Journal of Psychology, Measurement ui Improving Instruction, ’ 

Ivler, Ralph W, 'The Functi a ^^ucatumal Measurement Washington, 

Chapter 2 m E F 1951 

D C Amencan Council on Educalio 




Diagnosis 


A. Tlie Problem of Diagnosis in Education 
The nature of educational diagnosis. Educational diagnosis seeks 
to determine the nature and causes of unsatisfactory adjustment to the 
school situation It is concerned with the specific ■ueakncsscs of individual 
pupils Diagnosis seeks not so much to describe or explain educational 
maladjustment as to correct or pre\cnt it Adequate diagnosis is the basis 
of intelligent guidance and of effectiNC teaching 
Education borroiNed tlie term “diagnosis” from medicine, where its fun- 
damental character has been long recognized Medical diagnosis commoni} 
starts with some bodily symptom, such as pam or abnormal temperature 
The next step is to determine the causes that he behind the sj’mptoms 
The trouble may be the malfunctioning of some organ or gland, nhich in 
turn may be caused by some particular germ or toxic condition, and nhich, 
when located, may yield readily to the appropriate medical treatment or 
surgery The order of events is clearlj indicated by the rule “Before jou 
dose, diagnose’” 

The situation in education is much the same, although here the scope 
of diagnosis is usually broader At times educational difficulties can be 
traced to some orgamc defect, such as imperfect vision or hearing, or some 
glandular disorder, but educational diagnosis is more often concerned with 
functional disorders rather than o^nic Pupils who are perfectly normal 
orgamcally may experience great difficulty with \anous aspects of the 
school situation It is a matter of common knowledge that many serious 
learning difficulties arise, not so much from structural defects as from other 
factors, such as faulty habit-formation, lack of interest, or a poor home 
environment Despite th« sc compbcations an outstanding educator has as- 
8W 



DIAGNOSIS 


329 


verted that “experts m reading anthmetit, and spelling can now make 
diagno'^cs no Ic'S \ alid and rclialile than are most diagnoses in medicine 
rurthcrmorc, tlic learning process at an> time is usually conditioned bj 
man} factors, both inside and outside the learner It is rarely possible to 
isolate a single causalnc factor analogous to the di'tease germ m medicine 
but the xarious factors ma} be classified roughly as follows 


n «cii'orj equipment glandular balance health status sta{,e of 

nntuntN lc\cl etc ..it * 

1, Inteliccluil gencr.lmtclligcnrc qwcific talents and deficiencie. etc 
c I motlnml attitude interests drives prejudices feelings of inadequac) 

il I durational tnckground «ork habits etc 


■ a'setoVonuronment educational program teacher plajanates e.jmpmcnt 
b rxtraseliool onvironment home eommumtj church recreational faeih 

tics, etc , 

. «r «dn,-ifinnal diagnoMS has also increased to keep pace with 

1 he scope (vhication hen the con\ entional school conceived 

the growing concept of cducat academic knowledge and 

of Its function narrow I> in 

skills, the scope of or education to make it synonymous mth 

school has enlarged the co ^ pf 

the gro« th of personahtj ,h,t intcrfL n>th the ordinary academic prog 
nosis to locating the causes th presented by the school curricu 

ress of the pupil The „„ important part of any program of 

lum will doubtless alwa} Hnimosis naturally increases in scope and 

diagnosis in fact P --J/,rv"hool subjects are extended 
importance as the “‘>1™ such as attitudes interests apprecia 

to include the less tangi e o judgnient But some of the most important 
lions, tastes and to do mth social adjustments and 

and difficult aspects of d'ugnusm "a 

personality disorders of „[ diagnosis is much larpr than 

It IS hkcnise apparent that h ^ does not mean that tests 

the use of tests and dmgnosis On the contrary an 



adequate diagnosis may achievement . 

and specific, and f'S^ 'se pf vanons pieces of laboratory 

teacher-made, as nell as he u ,,ud the like In addition^" 

for measuring aensory acuity ,„nus of 

many kinds of tests, «!■““ observation questionnaires arf inter 

such as rating scales fonns of measurement in diagn 

v^portant - - ^ beaming and Teaching .r.. 

Jjtr'oS^DcremheraOt™ 



330 


MEASUnniENT IN INSTRUCTION 


thej are often by themseheb insufficient Kejs has ^\cll stated the role of 
intelligence tests in diagnosis ’ 

Few psjchnlogists todaj look to an mdinduals score on an intelligence test 
alone and of itvelf, to detonmne the 'ourcc of his difficulties or indicate the exact 
solution to his problems It is ciitireK probable, how c\ er, that the outcome of such 
a te*:! judiciouslj chosen and coinpctentlj administered, will contribute as ^^ch 
if not more to «ound clinical appraisal than any other single fact obtainable 
ProperU eupplcmented with other dugno«tic procedures, the information thu^ 
ltn\ed is Mrtudlv indispensable to intelligent attack upon a wade \anct% of 
DroblenLs 

The importance of educational guidance m the modem school arises fron* 
two facts (1) many pupils make unsatisfactory progress in school — some 
fail altogether and others achicxc little, and (2) few causes of maladjust 
ment he on the surface or ure self-evident 

It should be noted that up to the present time most tests designed spe- 
cifically for diagnostic purpo es ha\e been for the elementary school As 
long as the secondary school and college had lughly selected student bodies, 
their need for diagnostic tools was less acute In recent y ears the enlarged 
enrollments at these higher levels of education have greatly increased the 
need for diagnosis 

The value of diagnosis in education. There is an abundance of expen 
mental evidence to show the value of educational diagnosis combined with 
the appropnate remedial measures Such evidence is available on nil levels 
of instruction and in a vanety of subjects Science has added confirmation 
to the verdict of conaraon sense it really helps to “put the oil where the 
squeak is ” For example, Baker* found that four months’ special coaching 
of sixty mne-year-old pupils from seven Detroit schools resulted in a gain 
of about seven months in educational age The coaching consisted of tw o 
thirty minute penods per week devoted to the subject or subjects in winch 
the pupil had showm weaknesses Scruggs* compared the improvement of 
two equivalent clashes of fifth grade Negro children lu ICansas City, one 
of which had the ordinary group instruction m handwntmg and the other 
an equal amount of corrective practice based upon a detailed analysis of 
the weaknesses of each pupil In seven weeks the second group increased 
the average quality of its handwriting about twace as much as the fir=t 
In a similai study Gmler® found that fourteen sev enth grade pupils made 


o Applications of Intelligeace Toting Ra’iew of Educational Research 

8 2o6 June 1938 

* Educalwnal Disability and Case Studies in Remedial Teaching 

page BlooTOngton Illinois Public School Publishing Companj 1929 

I ^ Scroggs Remedial Teaching for Improvement in Ilandn-ntmg ’ Jour- 

nal of bducalional Research 23 288-29o April 1931 

'in Improvmg Handwriting Ability , Elementary School Jour 
nal 30 '>6-62 September, 1929 



DIAGAOSIS 


331 


in three months u numiul gain of three jears in quality of handnnting 
Blair* has summanzwl studiis in the tool subjects on the secondary level 
which show similar results 

It Ins been shown that the \nliic of such remedial measures is by no 
means coiifined to skill subjerts such a? handwntiiig and spelling For 
evainpk, a 'tlldi bl Leonanl' sliimcil Hi it jumor high school pupils im 
pros cd more rapidly in the abilily to w rile compositions tree from common 
errors m capitalization and piiiittualioii during a program involving error 
iinaljMS and appropnatc remolial exercises than did pupils of like ability 
expo cd to the coincntionalmclhoil of teaching MTule both groups showed 
dcnintc improx ement the mean decrease in the tw eiity -eight most frequent 
errors after clexrn fortx fix c-miimtc practice periods w as approximately 
twice as great for the expenmental as for the control groups Eyenments 
In Guder* on the cicmcntarx siliool the semor high school and the college 

lexcls showed comparable rc-ulls from similar methods 

Stone* found that pupils m the fdth and sexth grades m twenty thr c 
schor who dex oted not more than forty minutes a day for five week to 
schools, WHO cjiiicd two to six times as much m ability 

dmgnosUe and practice ^ ^ ^ 

tosolxere^onmgpro icmsas^didpu^^^^^^^^ the rLlts of the study 

regular anthmctic work reasoning ability resulting from this 

indie ited that the ® about twice as great for pupils m the 

diagnostie and remedial prog latest sexth that the gam 

highest sixth in j,,yareiit content and that it persisted tor at 

transferred to problems of a ditiereiu e 

least a year, at the end of '' reasonably eleat It is a sound 

The psycholoB of P e ^ that leammg always begins where the 
principle of teaching ^ Failure to observe this pnnciple 

learner's present hnowWg .repossible things One of these is 

Ttempting^fteach “ P"P'‘ b' Bo* 

cheek ups on the pupils progress 

, , Kcm,duA m S,co,uIa,y School, 422 

piLS'Tcw YoS Tte Sise, a. T-to C.pUal-t.oa and 

1 J Paul la the Elementary 

4oO-458 June 1933 R ^ 110-115 0'*°’''* «,d,ly lo Reason m Anil 

^Trh^ltfnf’Tn' Expenmenm^S^ 



332 


MEASUREMENT IN INSTRUCTION 


B, The Techniques of Diagnosis 

The levels of diagnosis. The process of educational diagnosis may be 
profitably thought of as falling into five steps, or levels. Figure 40 is a 
graphical representation of the process. It will be noted from the questions 
asked at each level that the first four steps — the II ’s — have to do wth 



Figure -40. The Five LeveU of Educational Diagnosis. 


corrective diagnosis, while the highest level has (o do with what may be 
termed preventive diagnosis. In other words, the immediate purpose is 
correction, but the ultimate purpose is pra'cnthn. 

Locating the indinduals needing diagnosis. How can we best locate 
the pupils not making satisfactorj' adjustment to the school situation? 
This is logically the problem with which the program of educational diag- 
nosis begins. TTie order of events is not unlike that described in the famous 
recipe for making rabbit stew which begins: “First you catch your rabbit.” 
Strictly speaking, however, while It is a necessary preliminary step, it is 
hardly a part of the actual process of diagnosis. 

Various ways of locating the individuals who require diagnostic study 
have been used. Sur\'ey and group intelligence tests are often emploj’ed to 
screen those whose achievement is unsatisfactorj'. Using this method Wil- 
son*® found that about lO per cent of the pupils in the seventh and eighth 
grades of fifteen representative cities and to^ms in the metropolitan area 
of Boston needed corrective instruction in the fundamental arithmetic 
processes. 


Se\ eral writers suggest that any pupil whose level of achievement is well 
below his level of inteUigence is worthy of special study. Others contend 
that a practical difficulty with the procedure is that tests of achievement 
and so-called tests of intelligence really largely measure the same thing, 
and suggest instead that diagnostic study be given to those pupils whose 
achievement in some school subject, or subjects, is well below their general 
achieveme nt level. Still other writers rely heavnly upon the judgment of the 


r ^ CofrecUTC Load in the Fundamentals of Arithmetic m 

Grades M \n. and pages 2S4-241 io The Role o/ Research in Educational 

I rogress. W ashington, D. C.; .\mencan EducaUonal Research Association, May, 1937. 









DIAGiWSlS 


333 


icliu g b\ t jkijiR tho^c who had recencd final marks of failure or con 
(lilional pa^MiiK in four fmnhmcntjl sub, cot. He admits that this cnterion 
nils used at Die outset pnmaril, bcemse of it, ai ailabilitj , but states that 
It aro'«o steadily in our esteem ” 

\lj these suggestions liaac roent Ibe judgment of the present teacher 
stioiitd iilwnt s tie taken into account especiall} since m the ordinary school 
«ll deter diagnostic and remedial in.rk is attempted mil be undertaken 
Id the regular classroom teaelier Hut the present teacher’s judgment needs 
to be siijiplemcntctl bj considering the judgment of past teachers a, re- 
flected 111 the school reeord Since the judgment of teachers ts not tnfaibble 
lionet er, general nchictemont tests and mlelligencc tests mil be found 
pirticularli \ftluable \n\ pupils m the intermediate grades whose acbeie- 
mont falls a jear or more below their age or grade level should usuahy 
nient come studj Di'crepanciea between achievement and intelligence are 
of particuhr 'ignificancc w hen intelligence has been measured bj indn idual 
tests or ijcrfonn incc tests rather than by ordinarj group tests Such dis 
erepaneics aKo a'^ume added Mgnifieance when the pupil has apparent!' 
Imd ample opportumt\ for learning 

\\ lule «pocul '•tudi and treatment arc often justified for the lowest o or 
10 per cent in the t\ pical <. las>s it must not be thought that diagnosis should 
be rc'stnctcd to low r mking pupils and to obvious misfits On the lontrarv 
^ome of the mo^t profitable ca«ic> arc tho«e whose achievement is average 
or evtn above but i» ntverthclcss well below what appears possible As a 
matter of fact Kildrcth''’ points out that many clmic'! prefer not to at 
tempt remedial vsork with very dull pupils say tliose with IQ s of approM- 
matclv 80 and below but prefer ni'.tead to alter the achievement goals ror 
‘sutfi cliiltlrcn It will be found at times that pupils whose personaiitv de- 
fects interfere w itli satisfactory social adjustment have supenor academic 
acfiicv ement fn ficf p«v thst the teacher .v-Jd eSte.'^ 

be mo^t concerned about the mental health of those who give her least 
cotici ni icademic dlv The wnter recalls the case of a sixth grade girl who^e 
vholastic achievement was well above the norms on the tests but whose 
Tttempts at social adjustment to the group had been distinctly unsuccess 
fill The girl told her mother that she would give anything m the world 
if she h id ni-’t one friend In the conventional school tlus girl would have 
becji regarded as making an entirely satisfactory record, but in the modern 
school she is seen to be so seriously maladjusted as to require special 

the nature of the difficulty After locating the PuP* " 
are experiencing trouble, the uext step is to make a careful examination 


” Harr% I « k^r 
'*r rlr I llil 
L dutalio 1 I III 1 ■<1 


^n'runr'^f^/uwJivit-ccnndCJnwt) pifc 
fh If 1147 


» fhilaiitlpl i‘ 



331 MEASURFMENT IN INSTRUC7I0N 

of the difficulty of each pupil A bill of particulars is needed It is just here 
that diagnostic tests, if atailable, arc of great talue The aim of such tests 
IS to reteal the specific location of the pupil’s difficulties As a rule, each 
test has a Imuted scope, but attempts to etplore thoroughl} this restricted 
area For example, one test might undertake to find the particular number 
combinations which are causing trouble in the addition of v,hole numbers, 
while another test attempts to find out whether inadequate reading abilitj, 
faultj techmque of analysis and procedure lack of skill in the fundamental 
processes or some other factor is responsible for poor performance in rea- 
somng problems 

Most of the diagnostic tests published to date are limited to the tool 
subjects mainly on the elementary Ie\el Travler” prepared a comprehen- 
sne bibliography of a\ailable tests together with a practical discussion of 
their effectue use Blair** compiled similar information wuth special ref“ 
erence to the high school Tra\ler*® olTercd this warning "Our experience 
at the Educational Records Bureau indicates that, at present, there is 
•scarcely one test which gives us as much reliable information as is needed 
for cffectue diagnosis in any one field ” 

But any test, whether standardized or not can be used to re\eal the 
location of errors The principal ad\anlages of the standardized test are 
that in content it is likely to represent a more careful selection than the 
informal test and that the existence of comparable forms makes it possible 
to \enfy the accuracy of diagnosis ba«ed on one form and to check upon 
the success of any remedial measures undertaken How e\ er, these special 
\alues m standardized tests by no means rule out the \alues of informal 
tests when used for diagnostic purposes '* In reading, for example, some 
writers regard informal tests as e\cn more important than standardized 
tests The diagnostic \alue to be realized depends more upon the teacher 
than upon the test used Durrell estimates that at least 75 per cent of the 
ca«es requinng special attention m reading can be handled adequately by 
well trained classroom teachers using non standardized tests supplemented 
by observation of the pupils’ achievement and work habits He says 

Such informal tests and observation charts usually mdicate the correct level on 
which to start remedial instruction the specific reading abilities in which the child 
IS weak and the faulty habits and confusions which must be overcome in the 
remedial program ” 


Figure 41 illustrates a useful procedure for aualy zing the errors revealed 

" Arthur L. Traxler The Use of Test Results tn DlQgnos^s and Instruction jn the Tool 
‘subjects 80 pages New'Xork Educational Records Bureau 1942 
** filenn Myers Blair op cil 

•* Arthur E Traxler Individual Evaluation m A etc Directions for Measurement 
and Guidance page 28 W a^hington American Council on Education 1944 

“Donald D Durrell Irnproiement of Baste Reading Abilities oage 18 Yonkers 
\y orld Book Company 1940 

** Ibid page 296 Quoted by special permission 



1 1 nxck* ico I \z 

II y^oyiyy gi oi 
:i yy/yiiyy k f I 
y yyyy\/o \z s | 
lyo s 2 I 



K (i/yyooc' 6 i f 
c zyyyoyo ii 'i 
y y yo y 91 
o y IT 

yyy oyo g z 
^ yyyyo s t 


yyyyyyy 

y/y 


•S. 5 *— S 
fen’s® — 55 Eo 

■ ^Co^o'^ CC^ 


(Lnst 21 Problems Are Omitted) 










33G 


MEASUREMENT IN INSTRUCIION 


bj a standard test in anthmetic The procedure is equally applicable to 
informal tests This particular test, Test 3, Arithmetic Fundamentals of 
the Metropolitan Achievement Tests, was administered to a fifth grade in 
October The pupils are arranged in descending order according to the 
score on this test Each error is indicated by X and each omission bj 0 as 
far as the pupil attempted problems, the problems bejond the last one 

attempted are indicated by The summary at the bottom shows 

how manj times each problem in the test was missed and omitted This 
«imple analysis reveals clearlj what tjT>e of problems caused trouble and 
to whom the trouble was caused The procedure is reall> group diagnosis 
but it may be regarded as the first step in individual diagnosis It should 
be apparent that classroom teachers who are content mcrelj to obtain the 
total score made by each pupil on a test are reallj ov erlooking the greatest 
V alue of the test for instructional purposes 
Similar error analyses can be made for most subjects, but are especiall> 
valuablem mathematics spelling reading handwriting and language It is 
usuallj better to make more than one such analj sis, bow ev er, than to relv 
upon a single sampling which is almost sure to include some errors that arc 
merely chance occurrences rather than habitual Brueckner and Elwell “ 
for example, found from the study of a test m the multiplication of frac- 
tions containing in random order four examples of each tjpe, that failure 
to work a single example correctlj is hardly a safe index and that at least 
three problems of each type are required for a valid individual diagnosis 
4 later study in subtraction showed that all the problems of a tjTie should 
be grouped together on the test 

It IS not sufficient however, to stop with tabulating the frequencies of 
questions nussed on tests or mistakes made m wntten work A further 
analysis must be made of the types of errors represented It will be noted 
that problem 24 in Figure 41 was missed by 23 pupils out of 28 As a basis 
for remedial instruction the teacher needs to know what t>pes of incorrect 
solutions were made by her pupils An examination of the test papers pro- 
vides the answ er Problem 24 follows 



It is found that 15 of the 23 incorrect solutions were 7f merelj a failure 
to reduce the fraction to its lowest terms Five of the 6 errors made bj the 


Leo J Brueckner and XIary Elwell Reliability of Diagnosi= of Eiror m Multiph 
cation of Fractions Journal of Educational Research 26 17o-18o ^ovembe^ 1932 
Leo J Brueckner and Mabel J Hankraaon The Optimum Order of Arrangement 

of Items m a Diagnostic Te t Flen entary SchoolJoumal 34 3ol 3o" Jai uarj 1934 




DIAGWSIS 33 - 

bcst7puNs«ereofth.stjpc ruepup.Isgotasanans^er7# ^bchren- 

Zii i r ^ the pupil 

fi I i for an ansucr An interct,ting t>pe of incorrect solution is represented 

lo a pnp.l ^^ho‘:c ans^^cr ^^as 75 It is apparent that he merely added the 
numcmtopi and tlic denominators without taking the trouble to reduce the 
fractions to a common denominator The other wrong answer was 7| 

\ second illustration of tlic \ due of error analysis is taken from spelling 
A few jears ago tlie wnter gaic a spelling test to a class of high school 
seniom 1 lie results wore disappointing One of the words misled most often 
was “undoubtodlj ” Contrarj to expectation, a tabulation of the errors 
rt \c dod (lie fact tlial the first (wo syllables were spelled correctly by all 
pupils 1 he misapellings were of four forms 'undoubtelly," “undoubtely ” 
"undoub(aI\/' and “undoul)taII> ' It can be seen that the fundamental 
error is mispronunciation The pupils were attempting to spell this common 
word as tliij wcrcarnistomed to pronounce it Hildreth" reports that con 
fuMon o\cr \owcls in the middle and end syllables is a prolific source of 
error, and lliathillables containing c.o ando are especially liable to vague, 
indistinct pronmicutnm Another m\cstigator*> found that emphasis upon 
correct pmnuncuiUon in reading resulted m a decided lmp^o^ement m the 
epel/mg of papih ui the fifth and siKth grades 
Olio of thogrcatc'-t \alucsof such crroranalysesis that they reveal that 
a relatively few types of errors made over and over again are responsible 
for the poor performance of most pupils In an early study of errors m 
Rpukcti language Charters" found that 71 per cent of the errors made by 
Pittsluirgli cliildren fell into only five classes A study m Madison, Wis- 
«’on&m," revealed that more than half of the total cumber of language 
errors made from the kmdcigarten through the sixth grade represented 
but four types In an extensive study Newland^* found that errors in wnt- 
ing only four letters, a, e, r, and t accounted for almost half of the lUegibil 
itiGs made, whether by elementary school, high school, or adult groups and 
that only four tyTcs of difficulties in letter formation caused more than 
half of the illegibilities It cannot fail to oe encouraging to teachers and 
pupils alike to find that remedial efforts directed at a relatively few trouble 
some points may result in great improvement 

Locating the causes of errors Even more important, and usually far 
more difficSt, than knowing where the errors occur is knowing why they 
occur One iimifation of test scores in diagnosis is that they reveal the 

»' E^Kay^^The EfTe^ of Eirorain ProauncJatioo upon Spelling EUmmUry 

Fngltsk Review 7 64-06 Marob 1930 

w Unpublished report made in 1919 Madi-on tVisconsm Ifadisoa Public 

» Language Cumculum CommUUe RepoHs Maoi on 

Schools 1932 , , , . I Qiiiflv of the Development of Illegibilities in 

» T Ernest Newland An \} Educaliond Research 

Handwriting from the Lower Grades to AdiUtliocw J<ru7 j 
2it 240-2o5 December 1932 



338 


MEASUREMENT IN INSTRUCTION 


jn-oIucU of learning rather than the learning process itself Tjler“ makes a 
u=eful distraction hetneen measurement or appraisal, and interpretation 
or inference In other nords, causation is not established directlj bj the act 

of measurement, but must be inferred from the measurement and other 
pertinent data Scutes'* puts the situation clearly 

A multitude of test scores arc m themsehes meaningless 3Iic\ slion facts, but 
tlicj do not "hoa reasons Thej neither diagnose nor eraluatc Tliey may be U'e- 
ful aids but thej lca\ e the pnncipal problem to the teacher 8 insight, namclj , that 
of determining i\hat is mdicated 

At times, as in some of the examples cited, a reasonably safe inference can 
be made from the nature of the errors themsehes But rarelj can a suffi- 
cientlj complete explanation be made without con'udenng the child’s past 
historj, outside the school as well as inside It is never safe to infer that a 
child’s poor performance in school is due to mental deficicncj or person 
alitj defects unle<K a careful studj of his educational opportunities has 
been made Fortunate indeed is the school whose records are suflicientlj 
complete to provide the essential data 

Certain outstanding physicians and surgeons have advocated an en 
larged concept of diagnosis m modem medicine Several jears ago Sir 
William Oaler argued that it was more important to know what kind of 
man had a certain di«ea«e than to know what disease the man had ilbur 
has made the following statement ” 

It IS just a« important in the«e daj-s for a > ouiig doctor to undersstand his patient - 
personal life home responsibilities and communit> relationships as it is to be able 
to tell just vihat oi^anisms are l»\ing in his lungs or invading his Iner Tlie 
doctor who has not studie<l psichoIog> and who cannot acquire a knowledge of it 
if he Ls to be successful will ha\c to confine himself to work in the laboratorj or 
be a pure technician 

Hildreth suggests that the following Ove “areas of investigation’’” are im 
portant m diagnosis 

Venial equipment of the learner Aptitude for academic schoolwork learning 
capacitj readiness for learning habitual modes of response judgment reasoning 
abilitj insight raemorv association perception attention span, abilitj to see 
relationships creatwe ability, intellectual interest, suggestibilitj , comprehension 
auto-cnticism habits 


llalph W Tj Icr Flements of Diagno<>i*> Thtrty-FmiTth > earbooK of the \ aiwnal 
Sleety for the St.ly of Lducalton page 113 Quoted by permission of the Society 
Bloomiiigtot Illinois Public School Publishing Company 193o 

^ Differentes Between Measurement Criteria of Pore Science 

1943°' Teacher- Journal of Educational Research 37 ! 13 September 

1938^^ 'Vilbnr The March of Medicme Science 87 201 202 M trcli 4 

** ienrmnj tte TAree K* pages 547 o40 Minneapolis Cduca 

tional Publishers Inc 1930 



DIAGNOSIS 


339 


I^'ingu'ige equipment Command of mother tongue, knowledge of foreign lan- 
guage** language first leamc<l, speech defect, immatunty in speech, in articulation, 
or diction, ^ocal)uhr}, mpiditj or slowness of speech, history of speech develop- 
ment, age of using words and sentences, descnotive powers, wntten composition 
Personality, ieinpcramenl, and dynamic equipment Self-control, alTabilitj, de- 
sirable and unde«irable inhibitions, attitudes, fnendliness susceptihihtv, docility, 
irascibilit>, dnve. pcrvcvcrancc 6Ubiht>, Iabilit> of mao'! compliance, responsive- 
nc^ rcstle*^nc®s, slivnc*^, tendencj toward embarrassment daj dreaming fears 
withdrawal from rcditj, 'cx interest, morbid cunosity irrational attitude man 
ner^, attitude toward failure and toward the school disabilitj compensations 
child 8 iiitcrcst<» attitude toward school preferred school subiccts child s plaj 
interests, obsessions fears worries. abilit> to get along with other children social 
qualities nttitude towanl brothers and sisters and other members of the faimlv 
.iel.nqucnl nnd anti-wml actu il.es degree of normal adjustment, changes, groivth 
and del clopment in all these factors since birth 

Phyncnl stolus, senjorj ond motor tqmpmmi phy^l rondiiloM Sensor, acuity 
constitut.onil defects, Dhjncal maturation physical haiidicans an.l defects disease 
Srghnduhr bahincc, condition etiology of illness posture accident 

or unusual physical shocks nutrition diet higienc, '"Xaut 

strength or weakness, handedness, steadiness, coorduiat.on, efforts to change 
handedness facility in sports and games , r 

rmaronment and *omc 

«ib8, mantnl status of ® ^ e g , books, musical instruments 

their contact with this o, “ home adjustments, attitude of home 

labor-saving dcvnccs m the home, ha > ^ , neighborhood environment as- 

child, activitiesm 

' alT. daily schedule Hising, eating, sleeping, play, sehoo.worh at home, 

regulanty or irregularity in Methods of instruction especialljr in 

School miualion, history ot^I^ H.fficultv size of class groujis, capability of class 
the work with which the eh'W peed, progress of other ehd- 

groUDS, school marks textboo retardation failure or double 

dren progress in ‘e^'fiu r ^arf tLcherteaehers usual success with pupils 
promotion, attitude of child rapiditv with which average child pr^ 

of her grade lesel, teachers classification system, provision for 

grosses, requirements of the courM the child s disability , former 

rad.y.dual assistance, date of M re school and m 

diagnostic and remedial „ould throw light on the situation, bod 

clinics survey of all school records tto analysis of previous training and 

aJextent oUnpervislon objective „ ..istruot.nn 



340 


measurement in instruction 

the teacher has been making to eradicate th. diffictlUi extent to tvhieh the teacher 
capitalizes the child s mterests 

It IS, of course manifestlj impossible, as well as iisuallj unnecessarj, 
to consider all the=e facts m anj particular case Satisfactorj explanation 
of the less serious ca'^es can often be found in a rclatueh fc^^ factors 
although rarelv, if er, in just one T he more «enous w ill usually be 
found more complex to anahze as well ns more difTicult to romech 
It mil frequentlj be nece«san to Mipplcmcnt the dat i of the existing 
school records A \nsit to the pupil’s home is often helpful A careful ob^er 
\ation of the pupil at \\ork is mother fruitful soun c of information Objec 
tue records of obsenationa made under controlled conditions are particu 
larlj important Considerable light i'* often thromi upon the attitudes and 
work habits of unsuccessful pupils bj obserx mg tin m it ork and then bj 
comparing successful pupils under similar conditions 
A skillful intenneu bj a tactful teacher w ill sometimes gi\ e a clue to the 
difficultj when other methods fail In the upper grades and the high school, 
check lists questionnaires, and other forms of wntten responses are xalu 
able aids to the personal mtemtw Maxing the pupil “think out loud” 
through the solution of a problem in mathematics or science or gixe an 
explanation of the procedure used is often most illuminating 
Two illustrations make clear the \aluc of the intcniew as a supplement 
to the written test m locating the sources of difficulty in arithmetic Bus 
well tells of a boy of better than axcrage intelligence whose work m column 
addition was both slow and inaccurate To the interviewer he explained 
that he did not like to add and so wanted to get the w orst of it o\ er as soon 
ns possible For this reason fie always addo<l the numbers according to size, 
beginmng first with the largest numbers and leaving the smallest ones till 
last But as this techmque meant skippmg up and down the column, it 
iiivolv cd great nsk of omitting some of the numbers altogether and of add 
iiig others more than once The story is told of a sixth grade school girl 
who had an elaborate but somewhat ineffective “system” for '?olvnng rea 
simng problems Her explanation was somewhat like this ‘ ^^^lenevc^ 

there's lots of numbers I add but when there s only tw o numbers wnth lots 

( f part® [digits] I subtract But if there is just two numbers and one is 
httler than the other I divnde when they come out even, and multiply 
when they don’t ” It is most unlikely that any analysis of test papers or 
observation of the pupils at work, would have resulted m a correct inference 
as to the real trouble m either of the cases abov e 

Teachers often find that an interview wnth the pupil sheds needed hght 
upon difficulties m reading and Liiglish Pressey and Campbell” report 
that one mnth grade pupil explamod r apitabzmg the w ord “Pirates” on the 
ground th at pirates are real persons just as much as “John Silver” or 

« Sidnej L Pre««ey and Pera Campbell The Causes of Children s Errors m Capital 
ization \ I cl ological Analj ais English Journal 22 197 201 March 1933 



niAGWSIS 


3)1 


"Captain KiJd ” Another teacher discovered that a boy had written a 
quarter to three” in answer to a question on a reading test when the correct 
ansuernns “twenty fi^e minutes till three" because e%erybodykno\\s that 
twenty -flic cents mike a quarter' 

Bron ncll ^ Ins shou n the possibilities of classifying the mental processes 
ii‘= 0 (l b\ the pupils as rc\ caled by inten lens according to levels of matuntj 
rcprc'cnted lie concludes that a reasonably flevible intemew technique 
in amhzing learning is "exceedingly valuable if it is sagaciously em 
plojeil " One •^unci of tlie experimental literature relating to the reJi 
nbilitv of the intcnicu arrives at the conclusion that ‘with well trained 
interviewers working under carcfull> defined conditions quantitative in 
Itrvicw ratings representing a complex over all evaluation can be made as 
reliable as most personality tests and more reliable than some of them 
I^everthclo'^s good inten icwvng requires skill as well as time and patience 

Remedial procedures The ultimate purpose of diagnosis is to afford 
a basis for cfTcctivc remedial procedures When the cause or causes of the 
pupil’s un'satisfactorj adjustments have been determined an intelligent 
program of correction can be planned and not until then ^Vhenever the 
same cau'cs appear to operate in several pupils group measures may be 
satisfactoo Usinlly , honev cr, remedial programs roust be planned for each 
pupil individually 

A study by Davis'* shows the close relationship between educational 
diagnosis mid remedial instruction Two extra periods a week were dev oted 
to 275 pupils of poor spelling abibty m grades 2B to 6A inclusive The 
results showed ‘ marked improvement ” Pupils remained m the remedial 
classes until they made perfect scores on the spelling tests of tw o successive 

Fridays Ihe average time required was 75 hours and bore little relation 
ship cither to intelligence or grade location Twenty four different types 
of difficulties were located, and listed with each difficulty were the most 
successful remedies found by the leacbera The ten most common diffi 
cultics with their remedies, are shown in Table 34 

Irader” has prepared some very convenient charts which outline ap- 
propriate diagnostic and remedial procedures for common types of disabil 
Itics in reading arithmetic language usage spelling and handwriting 
rigiiro 42 shows the chart for handwnting Note that a detailed analysis 
of samples of the pupil s wntrag is suggested as well as diagnostic charts 
and tests 


.Mt.tiumA BrotrroII Rate Accuracy and Process in Leammg Journal of E<h, 

rnJrmat PsycJwtogy do 321 327 Sept^ber 19« ^ Cameron The ^r^n!»ht^ 

Evalualion P,o.ra™ A, rnoi,. 

Spelt™ fiWfcr, irW Jcanial 27 015- 

fid April 1927 



342 


MLASURLMUNT JN INSTRUCTION 


TAB! L 31 

DlSTBlBCTlOS OT SPE1.USG DlFTICULTIES AND SUCCESSrUE REMEDIES (aITFR DaUS) 


DtficuUies and Remedies 


1 Has not mastered the Btcp«- in learning to fipoll a word 

a Teach steps until everj child knows them and ll^C!» them 
b Study each word wnth the children 

2 Wnte* poorlj 

a Discover particular letters or combinations of letters that an difli 

cult and practice on thc«c letter combination* 
b Practice words containing writing dilficultic* 

3 Cannot pronounce the word* being studied 

a Go over the words before the children study tl cm «o that every 
child will know what he i* study ing 
b Help the child to unlock word* for him'clf 

4 Has bad attitude toward spelling 

a Supervise study closely «o that the child will get into the h ibit of 
studying words correctly without wasting tunc 
b Try to show need for study 
c Give study work under tune pre«TOre 
d Try to appeal to pnde 

e Try to work up competition with self (that i«, of the pupil with 
himself) 
f Give reward 

5 Does not a««ociate the sound of the letters or the syllables with the 
spellmg of the word 

a Teach letter sounds 
b Listen to careful pronunciation 
c Teach the child to syllabify word* 
d Say words slowly again and again to hear sound* 

6 J»eed3 more time than can be devoted to spelling m the regular cla®s 
a Give more time after school or during the day when other work is 

finished 

7 Is discouraged because he misspelled so many words in the Monday 
test 

a Take a few words at a time 
b Study at odd times durmg the day 

c Have the pupil stay longer in the afternoon than the others 

8 Has speech defect 

a Listen to pronunciation 
b Look at word carefully 
c Teach diffi cult combinations 

9 Does not mark paper correctly 
a Teach child how to check 

b Insist on rechecking 
c Alway a check paper 

10 Interchanges letters 

a Study words carefully 
b Underlme difficult part 
c Try to spell by syllable* 


Frequency 



DIAGNOSIS 


343 


An offpcliNC mclhcHl bnght pupils ma> fail ^Jth dull In fact, no 
nictlnxl IS likclj to lmpro^c mitcnallj the academic aclue\ement of the 
mentally deficient child E\tn with normal or superior children the sub- 
stitution of correct habits for incoiTcct will require time Iso sudden trans- 

ClIAUl \ nWOtNRlTING 

SUG(rI.STf D DMONOSTJC WD RrMLW-tJ J ROCLDURLS 


1 


Tyi f of Diagnostic 

Di FECT PnocrocRE 


Shut 

n Too much pluit 
h Writing too 
Ptraipht 
c Knck of uni 
formitj 


Lsc diagnostic chart 1 
studj ilitlerci t aanipte** of 
nriting Draw line« 
through letters parallel to 
«Iint on different parts of 
page Compare tho«e lines 
a« to direction Observe 
pupil as he writes and n tc 
<lctail«“Posilion paper 


etc 


2 Migiiiiicnt 
a liRck of ut I 
formits 

b Ml letters a) out 
the fame height 


2 U*c diagnostic chart draw 2 
horizontal hoes through 
tt rittng even mth top and 
bottom of some of the let 
ters 


3 Qualit> of line 

a \\ritingtoohtav> 
b Writing too light 

c 1 me wavj and 
uncertain 


Lse di igiiostic chart note 3 
t>pe and size of pen and 
manner of bolding it note 
speed of wottog 


4 Formation of letters 4 
a Poor general form 
b Lack of smooth 
ness 

c Parts omittc I 
d Parts added 
e Letters not closed 


Jse diagnostic chart ifd^ 

, red letter form ma> be 
inahzcd m detail with 
Pressey chart Study gen 
ral form and hab^ of 
orming each letter Often 
^aultsin letter form are re 
,ated to only a few letters 


SucoESTED Types or 
RfmEdial Treatment 


‘Jome instances of poor 
slant can be corrected by 
changing position of writing 
arm or manner of graspmg 
pen Change m position of 
paper mil help others hiote 
that paper should be at an 
angle Other pupils must 
learn to turn their band as 
they approach end of line 
Cxplam to pupils effect of 
slant on quaht> 

Lxplain defect to pupil 
I/ick of uniformiU of al gn 
meot re«ulte parti} from 
motor ineo-ordination and 
will probably be corrected 
as co-ordinatiOD of writing 
movements improve 
through practice 

Make sure that pupil has 
proper writing materials 
see that lie does not use his 
writing arm to support his 
bodj If ime is thin and 
wavering give drills to 
speed up movement and 
improve co-ordination 

Make some use of move- 
ment drills to improve 
smoothness Practice espe- 
cially on movements com 
mon to several letters 
Study details of letter form 
with pupils and show them 
where they need to improve 
Have pupils practice mdi 
vidually on the letters which 
diagnosis has shown to be 
poorly formed 




CIUUT \ (ConlinUftO 


Type of 
Defect 


DlAGNOaTlC 

Proceolre 


Slooesti d Tyies of 

IlESlEDlAL TrEVTSIEST 


5 spacing of words 
a Too \nde 

b Too narrow 
c ^ot uniform 

6 Spacing of lettera 
a Tooviide 

b Too narrow 
c Not uniform 


and fi U'c diagnostic 
chart, study various sam- 
ples note whether mtle 
spacing or cron ding occurs 
on diflercnl part of page 
Observe pupils while wTit- 
ing for c«dei)ce of too 
much lateral movement 


and C Explain fault to 
pupil Ha\e him paj espe- 
cial attention to spacing 
R hife writing sampfes to be 
inspected bj teacher Mo\e- 
ment exercises are of some 
V due in improviDg spacing 


7 Size of WTiting 
a Too large 
b Too email 
c Lack of uni 
/onait> 


7 Studv different samples 
and compare mth those of 
other pupils in the same 
graile Considerable varia- 
bility ts allonable among 
individuals and especially 
betnecn grades Young 
pupils tend to write large 
Note freedom of moie- 
ment Try to<ii3co\crcas''s 
of lack of uniformity 


7 ^^ntIng that is too small 
may result from a cramped 
finger movement Give 
mov ement cterci«c3 to relax 
pupil and bring about some 
arm movement If writing 
is too large pupil can some- 
times correct it through 
conscious effort if his atten- 
tion is called to it In young 
pupils improvement mav 
hav e to await the process of 
maluTiition 


8 VTnting not neat 8 
a Blotches 
b ^ ords crossed out 
and rewntten 


Examine samples of nnt- 8 
mg especially those pre- 
pared in daily work itli 
respect to blotches sec it 
wnting matenab arc de- 
fective 


See that pupil has proper 
writing materials and that 
thev arc kept m working 
order Explain effect of lack 
of neatness on all school 
work Make daily work in 
other subjects tho gauge of 
neatness 


9 Speed 

a Wnting too slow 
b Wnting too fast 


9 Speed of wntmg affects 
the quality , but aside from 
this fact it 13 important m 
that some pupils wntc so 
slowlyandlabonously that 
they have difficulty m pre- 
■ paring assignments on 

I time Give a lest of speed 

1 of wntiDg and compare 

I number of letters per mm 

I ute with grade norms 


9 If wntmg is too fast show 
pupil Its effect on letter 
form and have him wnte 
samples under tuned con 
ditions Some pupils go to 
the other extreme and wnte 
so slowly that tbev prac- 
tically draw the letters 
Give movement exerci«e3 
while counting rapidly to 
break down, habits of slow 
movement Insist that pu 
pils speed up wntmg re- 
gardless of their letter 
forms Their wntmg will 
probably deteriorate for a 
time, but when old habits 
are broken down teacher 
and pupil can build new 


Figure 42 Traxler Chart of SwEgpi,ted Ihagncy^tic and lltmediil Procedures u 

llandimting 


314 



DIAGNOSIS 


34o 


formation is lo be expected But if onlj negligible progress results from 
extended practice the remedial program should be reM®ed 

l*rc> ent n c diagnosis In the long run, the greatest \ alue of a diagnostic 
and remedial program is the di'coxerj of pre\entable factors within the 
control of the school which letd to maladjustment Frequently modihca 
tions m school organization, aimeulum instructional matenals and teach 
ing methods arc suggested hj an analjsis of what is happemng to the pupils 
under the existing program Manifcstli , factors which have produced learn 
ing difhcultics in the past are likel> to do so in the future It is alwajs 
better, and gonenlli easier to pre\ent errors than to correct them It will 
often be found that a program of studies which provides wader differentia 
tion in method and content to suit pupils of varjang abilities and interests 
IS the wav out of manj difficulties The sj^^tematic use of readiness tests 
of various tjpes to determine when the pupil is sulKneiitly mature phjs 
icalU, menlallj, and Eoc.allj to hcgin the regnlar v^ork of the Etatie 
and a judicious u«c of aptitude tests to establish the pupil s fitness for the 
more formal and abstract subjects such as anthmeti algebra and fore.^ 
hum, ae™ nil prctcut much ucctllcss failure Tormau- says 

goal 

SLUcnii RbFEniu>-ES for Fomhir iti-Arisa 
Hetts Lmmctt Albert 

DiJ-frculwICmdancc Nc Remedwt TtarAm m Itnodory Schools 

lllair, Glenn Mjers Oiog jijjo 422 pages 

Ne» A ork The Macmillan o P . g pj ,M Guidance and Reme hal 

Bojd Gertrude Prj.cojW ReiorcA 43 494 500 March 19e0 

Reading Practices Journa j in tl e Faciht ition of Lcarmng 

Cook MalterM 

Croubach Lee J R«d.o- ' Pe.sona.itj and Mot. v a 

PhiUdelphia Fduca 

"■rfSr;: M andTAa* Arthur K ProJ- 

McCullough ConstauceM Strang 11^^^ McGraw Hill Book Compauj 

tn the Improtement of Reaavid 

„ ,a . Grace M s 

Terman Foreword ‘ by permission of the p 

•ichooi Subjeds page « copyright 
McCraw IJill Book C 



346 


MVASVRmmT IN INSTRUCTION 


Simpson, Robert G , Fundamentals of Educational Psychology Philadelphia J B 
Lippincott Company, 1949 Chapter 13, *'AnaI> zmg the Learner’s Difficulties ” 
Stauffer, Russell G , "Certain Basic Concepts in Remedial Reading," Elementary 
School Journal, 51 334-342, l-ebruaj^, 1951 
Stuit, Deiiej B "Counseling Methods Th&gnostics/* Annual Rmew of Psychology, 
2 305-316, 1951 

Tjler, Ralph W, "The Functions of Measurement in Improving Instruction," 
Chapter 2 m V F 7 indqmst (Editor), Educational Measurement IVcahmgtoa, 
D (} American Council on Education 1951 



13 

Classification and Promotion 


A» The Nature nnil Educational Stgnifirance of 
Human Vnrialtilit) 

The prolilcm of human %ariabilit>. Tbe existence of vanahiJjty js 
ono of the best established facts about human beings Obvious differences 
m height, ^\ eight, strength, and good looks could hardly escape the notice 
of tho most casual observer The greatest seers and ^^se men of all ages 
Im\c recognized also the less obvious but more important differences m 
ability, interests, and needs One of the familiar parables of Jesus, for ex- 
ample, IS tint of the talents * 

It would he difficult today to find a fuller recognition of the educational 
significance of individual differences than appears in the wntmgs of tho'se 
two apostles of human liberty, Jean Jacques Eousseau and Thomas, Jeffer 
«oii Itousseau o^'serted that “it v^ould be a great mistake to bestow it 
Imstrucfionl on all children mdiscnminately and without regard to their 
mdjvidii il differences Jefferson wrote of a proposed educational meas- 
ure "Tho general objects of this law arc to provide an education adapted 
to tho >oars, capacity and the condition of every one, and directed to his 
freedom and happiness It is apparent, therefore, that when the author 
of the Declaration of Independence penned the famous line, "All men are 
created equal," he had m mind equality before the law, and that he recog- 
nized fully tlio duty of tlie state through education to provide eqmhty of 
opporiumfi/ A prominent American educator* argues that our concepts 


‘ "Ami unto one he gnve five takiita to another two, and to another one to even 
man according to his several ability ” Matthew 25 IS 

* Jean Jacques Rousseau The Ifexo Helaue, Part V, Letter 3 

* Thomas Jefferson Azoles on Vtrytma pages 250-252 » e- ^ ^ 

♦I Newton Edwards, "We Need New Purposes m Education, PAt Delta Kappan 

2S 10 September, 1946 

347 


34S 


MEASURmWNT IN INSTRUCTION 

of freedom and equality arc outmoded” and cannot both be realized, since 
they “are in fact mortal enemies.” 

It is surprising, therefore, to find that the problem of individual differ- 
ences vas not seriously treated in psychology before the time of Galton 
in the latter half of the nineteenth century, a neglect ^\hich has been char- 
acterized as perhaps the “most extraordinary blind-spot in prc\'ious psy- 

cholog}'.”® j 

Otto* estimates that during the last twenty years more time and ettort 
in educational research have been devoted to the study of indivddual dif- 
ferences than to any other single topic; he is greatly impressed with the 
extensive literature available Yet in 1925 a competent school psychologist 
stated that the “schools heretofore have to a large extent ignored these 
differences Five years later a national surv'cy revealed that “provisions 
for individual differences, in general, arc innovations in the sccondarj' 
schools "* In 1936 a survey of 300 courses of study showed that only' about 
one m ten “contain any' suggestions for adapting instruction to individ- 
uals Fight years later an educational psychologist characterized as 
largely “hp service” the attention educators give to individual differences. 
Davis say's: 

Despite its philosophy of indmdualizalion, the school, in practice, footers a 
program of regimentation and standardization. 

In the meantime, better enforcement of the compulsory education laws 
and the rapid increase of secondary'-school enrollments have served but to 
intensify the problem, the nature of which was more accurately revealed 
by scientific measurement.** 

Group differences. Scientific research, on the whole, has showm that 
differences between groups are not so great as they are commonly assumed 

‘ Gardner Murpby, Histoncaf /nfrorfuclion to iVafem Psychology (.Revised Edition), 
page 117 New York Harcourt, Brace 4. Company, Inc , 1949 

* Henry J Otto, Elementary School Organization and Administration (Second Edition), 
page 160 hiewYork D Appleton Centurj Company, 1944 

A Sutherland, “Factors Causmg Maladjustment of Schools to Indmduala,” 
Ticenly Fourth Yearbook of the National Soaety for the Study of Education, Part II, 
pages 29-30 Bloomington, Illinois Public School Publishing Company, 1925 

* Roy O Billett, Pronsions for Indtmdual Differences, Harking and Promotion, Na- 
tional Survey of Secondary Education. Monograph No 13, page 8 Washington, D C • 
United States Office of Education, 1932. 

* Henry Harap, Differentiation of Curriculum Practices and Instruction in Element- 
ary Schools,” Thirty Fifth Yearbook of the National Society for the Study of Education, 
Part 7, page 162 Bloomington, Illmow Public School Publishing Company, 1936. 

‘0 Robert A Davis, “Expenmentmg m EducaUon,” Educational Administration and 
Supervision, 30 1-16 January, 1944, 

“ For an excellent summary, see A R Gilliland and E L Clark, Psychology 
of I^iindual Differences, 535 pages New York* Prentice-Hall, Inc , 1939 A standard 
work m this area is Anne Anastad and John P. Foley, Jr , Differential Psychology 
h^mdual at^ Group Differences tn Behamor (Revised Edition), 894 pages New York’ 
The Macmillan Company, 1949 



CLM^HIFICATIOX AND PROMOTION 


349 


to bo. Tlierc i*? little basi'i for the widespread illusion that the group of which 
one happens to bo a member is superior, while all others are inferior The 
intellectual differences betw cen tlie sexes, for example, are in general slight 
Furthermore, all levels of mental ability are found in all economic, occupa- 
tional, and social groups, although not m the same proportions Even the 
differcnce.s between races have been grossly exaggerated, and such differ- 
cnci^s as appear probably reflect cultural rather than innate intellectual 
variations It is manifestly impossible to make adequate pro\n&ion for indi- 
vidual (lifTerences by classifying pupils for mstnictional purposes according 
to tlie .social, ceniioimc, occupational, racial, or other similar group from 
which they come. Fortunately, perhaps, for democracy the problem is not 


so .simple as that. , , 

Almost without exception the average differences between groups are 
less significant than the dillcrcnccs nithm any single group An 
evampic or tins is the enormous ovcrlappmg among school grades Although 
tho average dilTerencc in intelligence and in achievement between siicces- 
schXrndcs rarely exceeds one year, the 
is likely to he at least four or five years As a matter of fact. Baker points 


rnMMITTEE 

■llinCMENT. 









•aaaa 

• • 

• 

•••a* 

•aaaa 

• • 

• 

••aaa 

aaaaa 





• • 



••••• 

• • 



••oaa_ 


NORMAL 15 47 €8 ^ I 

ACTUAL I — 

r«..ie n. General Quahlv O re^ 

out that the achievement ^^J^Tna “^1*" I'npTfp 

halves, of two adjacent grad« isueua X 

two halves of the same O" ^ 10 per cent of the high schoo 

the Pennsylvania study shoi .allege seniors, nhile nearly 10 per cent 
seniois exceeded the median of the college 

11 Yeafhook, op ett , pages 137, 45 




350 


MEASUREMENT IN JNSTRUCTION 

of the college seniors fell below the median of the high-school seniore.” 

It must not be thought, however, that one group is just like every other. 
As a matter of fact, certain tiiMis of groups differ from each other very 
much as the individuals vithin any one group differ from each other. For 
example, Figure 43 shows the distribution of the ratings of 200 high schools, 
which closely approximates the normal curv'C. It is quite likely that all the 
schools in a single state would be similarly distributed on practicall} e\cry 
characteristic. 



Figure 44. Distribution of Mean Scores of Seniors in Forty-Nine Colleges m 
Pennsylvania on a Test of General Academic Knowledge. (Data from The Sludent and 
His Knowledge, page 78.) 


Figure 44, although somewhat asymmetrical, shows that, on the basis of 
the mean achievement of their seniors, 49 colleges in Pennsylvania have 
the wide range and the heavy concentration near the center that charac- 
terize normal curves. In other words, there are differences among institu- 
tions just as there are among individuals. It is this fact that makes the 
traditional classification of schools for accrediting purposes such a baffling 
problem, and that has been responsible for the trend toward evaluating 
each school in relation to its own objectives and program rather than in 
relation to other schools. It is now being recognized that it is just these 
differences that give individuality and distinction to institutions. 

» William S. Learned and Ben D. Wood, The Student and Hts Knowledge, page 21. 
New York: Carnegie Foundation for the Advancement of Teaching, 1938. For a later 
illustration from the results of the Army General Classification Test (AGCT), see: 
Walter V. Bingham, “Inequalities in Adult Capacity— from Military Data,” Science, 
104: 147-152, August 16, 1946. 


Cl \SS1FICAT10N AND PROMOTION 


351 


Indn hlunl jliffercnces. In contrast nith the differences bet^ een groups, 
u Inch h‘v\ c frequently been o\ ercstimated, the differences wthin the group 
usually been underestimated A\hilca vague notion of individual dif- 
ferences has long been in existence, no adequate knov\ledge of the nature 
and extent of thc«e differences was possible before the appearance of sci- 
entific mea'surement Such profound thinkers as Plato, for example, be- 
hevetl tint all perbons fell into a few rather distinct groups In fact, the 
idea tint many' human abilities arc distributed on a continuum, wath a 
concentration near the middle, is a modem conception 



F,p,™ <5. D..r.bu.,o« or 

St-mtord Dmct ' J/rasunnj Intelhgmce page 37) 

CA’fl 2 to 18 years (Irom icrmanaiiu 


p.ctu7e avadable of ‘n‘" 

Amcncan-bom white ™ i,„„on for ninth-grade pupils Three char- 

page 260 shows a /f‘"^7cune” shoufd be noted (1) the ur* 

actenstics of the so called ( 2 , the cmhnuous dtstnbuhon no 

range from lowest to highrat l«or^, I > Manj dis 

breaks, and (3) the Skewed curves 

tnbutions of test scores have th^ concentration is not 

ddler from symmetrical b„t.„„ approximately two ttods 

exactly at the center In a stance from the mean After 

of the pupils he within ^ ™hu 11 says ^ 

a survey of the expenmental evidenc , 

„ . . Meroll Af««»v"”7 page 37 Bosto 

.. Lewis M Terman A 

honkers 

1928 


pany 



3)2 


ML 160HCVEXT IN INST KUCl ION 


We “hall probaljU not Iw m great error if oe conclude that among tndivuluaU 
ordinarily regarded as normal, in the arerage rocaltan the most giJUd will be between 
Viree and four lime^ as capable as the vooresl 

Important as is the ^\ide range of abihlj' between the two extremes, the 
importance to education of the continuous distribution is equally great 
On no trait do indi\nduals naturallj fall into a feu distinct groups, such a« 
‘infenor,” “average " and “superior,” or “dull ” “normal,” and “bnght ' 
Such so-called “Ijiies” are purelj arbilrarj' It \NOuld be po'^sible to make 
an equallj good ca«e for an^ other number of classes “In a literal sen'^e 
e\en.one is exceptional There are similar difTcrcnccs in nonintcllectual 
traits 

Trait xanabililx. \ot onlj arc there differences among groups, and 
differences among the indixiduals of anj one group, but there are abo 
important differences among the traits making up anj particular indi 
\adual Hull made a careful stiidj of these differences and came to the 
conclusion that “the distnbution of talent ^\lthm an mdixidual follovss the 
normal \&\\ much as do the distnbutions of indiMdual differences Xot 
oulj did he ob«erae a “distinct tendency to approach the characten^tic 
«hape of the normal probabilitj ciine,” but he also found evadcnce that 
‘ the a\ erage indixadual’s best x ocational potcntialit j must be bctxx een two 
and one-half and three times as good as his worst The importance of 
these trait differences in an indmdual for educational and \ ocational guid- 
ance can hardlj be oxercmphasized It is also apparent that satisfactorj 
abihtj grouping m one trait maj be wholl> unsatisfactory in other traits 
The educational problem is further complicated by the fact that, while 
intercorrelations of thc'C traits are u«ualh posilix e, the correlations are far 
from perfect This means that when an attempt is made to secure a group 
homogeneous in one factor, it is still heterogeneous wath respect to other 
factors It is apparent, therefore, that it is impossible to make groups trulj 
homogeneous for instructional purpo'^s, exen if it were desirable to do so 
The best that can be done is to reduce the amount of heterogeneitx Oppo* 
nents of abihtj grouping haxe made much of this point, apparentlj quite 
obbxaous to the fact that ipso facto they are attacking a straw man, for, 
mamfestlj, one need shed no tears oxer the dangers of an educational sit- 
uation which one's own data proxe to be a ph>sical impossibility 
The concept of the xersatile indixndual who is equally gifted in a con- 
siderable number of directions is largely a fiction and as an educational 
ideal is capable of domg much harm It has been ridiculed as follows « 


“Edmunds Conldm and Frank S Freeman Inlroduclory Psychology for Students of 
«m page olo New York Henry Holt & Companyri^39 
Clark L Hull op cit , page 46 
** Ibtd pages 46 49 

“AmoaE Dolbcar ‘AntedOuvian Education ' Journal of Sducatum 08 424 190S 



CL I.SS// ICAJIOX AL’D rilOWTlOV 3^3 

If iin aninnl Inil «IiDrt Irjri iml good umgs otfcnfion eIiouI i be demoted to 
r»J!»mnc 'O to otc» up the quahtie<i as fir as possible 

‘'o the (luck «is kei)t xviiUlmg mstoul of snimmmg The pelican ais kent 
1.1 ^liort xungs m Ihc attempt to flj The eagle uas made to run Ld 
ilJoned to fix onix for rt'creition 

U1 this m the name of edmitmn \ittirc xxis not to be trusted formdmdual 
Miould l>c sx-mmctncallx dexclopctl and ^umbr for their own nel/are as uell as 
iot tho xi eJ/are o! the communiti 

TJic atiunala that xxould not submit (o such training but persisted in dexelopin*' 
the licst Rifts tlici had xxcre dt honored and humiliated m manj xxajs Thej were 
‘‘tigmatiicd as narroxx minded ami fjxeciali^-ts and special difBculties nere placed 
in them «aj nhen thcj attempted to ignore the theorj of education recognized 
in the fcIkxiI 

Iso one was illoxxcd to graduate from the school unless he could climb sxxam 
nin and fix at certain prc'cnled rates «o it happened that the time nasted 
the duck m the attempt to run liad «o hinder^ him from snimmmg that his 
''"unming muscles had atrophic^d and so he xxas hardh able to sxxnn at all and 
in addition ho had liccn «cold(Kl punished and ill treated in many ways so as to 
tnnkc hi3 liic a burden Uo left school hunwhated and the ormthorhj'ncbus could 
float him botli running and swimming Indeed the latter was awarded a prize ir 
txxo departments 

The eagle could make no headixa> in climbing to the top of a tree and although 
1 0 shoxxed he could get tl ere ju»t the same the performance xras counted a de- 
ment since it had not been done xn the presenbed waj 
/\n abnormal cel with large pectoral fins proxed he could run swim climb trees 
irid fly a little He xras made xaledictonan 
hdiicntionnl proxisjons for indixidnal iljlTerences Attention has 
already been called to the fact that fexx schools are making adequate pro- 
X isious /or the indn idual differences existing m their pupils No point stood 
out more proininentlj in Billctt's studj"** than this Table 3o summarizes 
the situation for secondary schools in 1930 Bvllett reduces the^e prox isions 
to sex en categories ( 1 ) homogeneous grouping (2) special classes (3) plans 
characterized b} the unit assignment (4) scientific studj of problem cases 
(j) \anation m pupil load, (6) out of school projects and studies and 
(7) adxisory or guidance programs Of these the first three haxe been 
found to be core elements m a tjpicallj successful program to provide for 
individual differences But it will be noted from the last column that 

the most successful provision m the opinion of those using it 
geneous grouping „ hich had a mlio of 26 per cent In other w ords hardlj 
more than one principal m four or fine using anj of these plans has a con 
siderable degree of confidence in them 


lioyO BiUett op aL p-xges&'U 
Ilnd page II 



MbikURJ'yiENT IN INSTRUCTION 


table 35 


FEEQ01.SCU-i EITII ttlllUl ^AmoUS riM»I1IOSS FOE ImiH lUnAI DirFEItEVCE! 
Weee Rffoiiteu in If«F OK IN UsF WITH IInunuae Sucof^s, 

KT 8 591 SrCONWAKT Sciioois IN 1030 Caftzh niELCTT) 


^atureof I’roiision 

Proniton in 
Use 

Pronsion m 
Use vnth 
FstimaUd 

U niisuaf 
Success 

Ratio of 

A utnber of 
Provisions 
tn Use to 
Number VI 
Use tnlh 
Estimated 
Unusual 
Success 


Num 

l>er 

Per 

Cent 

Num 

ber 

Per 

C-ent 

1 Variations in number of subjects n pu 
pil may carry 

6 42S 

7o 

793 

0 

12 

2 Spocuil coaching of ^low pupils 

5009 

59 

781 

9 


3 Problem method 

4 216 

49 

444 

5 


4 Differentiated a^oignments 

4 047 

17 

788 

9 


5 Advisorj program for pupil guidance 


42 

540 

r> 

r» 

6 Out-of school projects or etudies 

3 4a)1 

40 

419 

5 

20 

7 Homogeneous or abilitj proujiing 

2 740 

32 

721 

8 

8 Special cla«es for pupils who ha\e 
failed 

2 012 

30 

3)0 

4 

n 

9 Laboratorj plan of instruction 

2fll 

30 

323 

4 

12 

10 Long unit assignments 

2 312 

27 

340 

4 

15 

U Project curriculum 

2 293 

27 


4 

16 

12 Contract plan 

2 293 

27 

4fo 

5 

20 

13 Individual instruction 

2 Ito 

2o 

309 

4 

14 

14 Vocational guidance through cxolora 
tory courses 

1,911 

22 

186 

2 

10 

13 Educational guidance through explora 
tory courses 

1 900 

22 

193 

2 

10 

16 Scientific study of problem cases 

1343 

IG 

146 

2 

11 

17 P8>chological studies 

1077 

12 

70 

1 

06 

18 Opportunity rooms for slow pupils 

946 

11 

172 

2 

18 

19 Mornson plan 

737 

9 

17o 

2 

24 

20 Special coaching to enable capable pu 
pils to skip a grade or half grade 

726 

8 

114 

1 

16 

21 Promotions more frequent than each 
semester 

686 

8 

103 

1 

15 

22 Remedial classes or rooms 

o93 

7 

90 


U 

23 Adjustment clashes or rooms 

544 




10 

24 Modified Dalton plan 

486 




11 

2o Opportunity rooms for gifted pupil'* 

322 




21 

26 Restoration cla®®es 





13 

27 Dalton plan 

162 




09 

28 IV innetka technique 

119 


14 

0 

12 

29 Other techniques 

101 

1 



Table 36, based on returns from 48 large and jS small cities, show s trends 
in the elementary school ” Increasing attempts to introduce more flexible 

'• V V Caldwell Some Facta Beganlma Elementary School Trends ’ Sdwol and 
Society 49 28.i 288 March 4 1919 


CL[SiillICAIIOi\ AND PROMOTION 


T\BLr 36 

TnnsDS Toward CitrAT^R 1 ‘r »or Isdiviuuai DuFtuLNCEb i>j 

I HMJNTAKY S HO«t|H (AtTtR CaLDWFLL) 


A Daily Schedule 

1 I/jnRrr i^nwi 

2 Flexible proj^^nm 

3 Subject mnltor lipidincx eliminated 

4 Skill- content creative aetiMtics 

B Curriculum Content 

1 Chil 1 cxpirieneei is leaminR ba-ts 
*» 1 limimtion of epecific subjects 

5 Mo" freodom tor tcoehcr m .otTproUnE oounc 

4 rmpha-is on habits not fact-leammg 

iSSEsSrsri-!:! 

10 Ttanjlion of heolth ns “ J“‘ 

11 Prov,=,op for >'> Swetopn-e" 

12 \ ocotion activity prostant Ocveiopio 

'■^Sr'fSriiCta- la 

2 Autom-itic lightins ciioipittcnt 

] t't°nruS‘;o"r5»o--tioi. 

^ {:z:;o::SSSs 

7 Inolatioo of rick '’"'‘‘''S.i 

8 More «“'■ /pa" P" '™“ „,„3 

9 Provi-iun for safe 

10 Provision for ^eS^d ncather 

11 Provision for play *P 

D 3/ntcriats , content) 

1 Bos a texts elimma d 

2 U I le reading _^g gte 

3 i bmination of wor j „ork 

4 ViinclyofmatenalforareiMi 

E CiasnCratim . , „„ji|„ol dmira 

1 Provision for .ggts 

2 Use of rcailing resdinc^ 

3 Del tj 11. ‘■'‘■'"'■’"f., .ee «‘l-“ 

4 Group.n„- bi eoeial age 

gciiie or » „ 

5 Use of no '^'1“™ P™ ,„„ca 

6 neduetioii 111 P IP r not pronto‘'°? 

7 Use of teets tor fc^iJ per teacher _ 

8 Reduction in nunbe V 


/ arj;e Cilien 


Per 

Cent 


Isum 

ber 


Per 

Cent 




356 


MEASURE.'ifENT IN INSTRUCTION 


ctlucatioual programs are ahown It is also apparent that much > ct remains 
to be done But it is encouraging to note that somewhat more than a third 
of tbe^e schools were using reading-readiness tests and other tests for diag- 
nostic and guidance purposes rather than merely as a basis for promotion 


B. The Ac tint j iMoicmcnt 

In recent 3 ears no program of instruction has recei\ ed more attention 
among educators than the actint} moi ement, usually a prominent feature 
of the so-called “progressue schools” Yet educational histonans assure 
us that the principle that man learns by doing is “as old as man’s earliest 
education ’’** In fact, its roots he further back than the beginning of for- 
mal education in schools Its ad\ocales go e\cn further and assure us that 
it IS grounded m the fundamental nature of the learner himself Howe\er, 
there are such wide di\ergencies among its champions, both in theory’ and 
m practice, that it ma}* be said that the aetinty movement not only 
recognizes indi\’idual differences to an astomshing degree, but al«o actually 
demonstrates such differences The essentia! features of this educational 
program may be bneflj , and somewhat inadeouatelj , described as follow s 

1 Education results from the child’s own purposeful nctmty with proc- 
esses considered personallj Mtal to him An actmltj, according to Kil- 
patnek, IS “a umtarj sample of actual child living a*i ncarlj complete and 
natural as school conditions will permit At every «tngc the organism 
reacts as a whole and the phjsical intellectual, and emotumal expenences 
are interrelated 

2 Learning is inherent withm the life process itself It results naturallj 
from the learner’s self-directed nurposcful activilj Teaching, like learning, 
IS mdividual in character, ansing from a felt need The teacher is only a 
guide, and all subject matter is merelj a tool The activnlj program clearlj’ 
places upon the shoulders of the classroom teacher the difficult problem 
of adjustment to individual differences 

3 Interest is at all tunes the motivating factor in the learning process 
Although all teaching procedures recognize the value of interest, the activ - 
ity movement emphasizes more than any other program the importance of 
inner drives and interests of the individual pupil, as opposed to extraneous 
motivation of anj’ kind 

4 The dev elopment ot the learner’s personality, rather than the accumu- 
lation of facts and skills, is the objective of all learmng The personnlitj 
of each individual will develop in accordance With his own abilities, 
interests, and pereonal expenences 

5 The evaluation of tins relatively mtangible nersonal development m 


« Thon^ Activism,’ in Thirty-Third 1 enrhook of the 

\ aUmal Society f<^ th^ Study of Educatwn Part /t, pages 9-43 Bloomington Illinois 
Publm School ^blishi^ Company. 1934 Quoted by permi.«ion of the Societv 
« Thirty Third I earhook, op ext , page 63 Quoted by permiSMon of the Society 



CIASSlIICAUO\ U\D PROMOTION 


357 


■V oh cs u fairlj long time-span and, therefore, lends itself more to qualita- 
titc than to Quantitative judgment In the evaluation process the pupil 
himself is an active participant Aceoidiiig to Dewey, “the more mature 
and evpcnenced the tl icher, the less wall he or she he dependent upon 
tangible, directlj applieal.lc extenial tests, and wall me them, not as final 
hut as guides to judgment of the direetion m vvIiilIi development is taking 

’’'Tt'should not he overlooked honever that regardless of the relative 
emphiis such activ ities as reading and arithmetic are alvvays going to l^e 

able IS no jubtifii ation for ^ „,j,|,nce of suitable tools no 

the tools for which do cMst ^ 

rirun''E”r'Tnde^," ho r,Zd]“ probably greater, as Gates suggests “ ^ 

Anv scheme of education that oTmeasurements than 

cn masse 

C. Ilomogcricous or " U hgs sometimes been erroneously 

Individual and group between individual and group 

assumiKl that there is a , dual learning, it can take place m a 

instrui tion Wule all can take place only m a gioj 

group setting, and certain tyP^ »' ^be group, he learns /ran. the 

setting, for the individual rtant question is What Und of 

group as well In other f ' Znulml icammj? The problem is 

group organization best pro>n extremes of a complete tutonal sjs 

„„ ; 

for grouping pup>>’ " f became e«dent, ho«ver,^_^ 


;roup intelligence tesm ... - - ■ pdure was commo,..a ■ 

or grouping pupds m echo 1^ evadent, to"';;-’ „ 


other cJmracienbt t,OIne^^hat tnc ^ a more accu 

rate term, although frequently by penn.-o7;t;r 



358 


MEASUREMENT IN INSTRUCTION 

grouping ” mile much confusion still CMsts, many T\ntere have rccentlj 
attempted to make a distinction betixccn these terms Instructional groups 
^\hich are made less heterogeneous m learning abilitj , usually bj the cm- 
plojuncnt of general intelligence tests, arc called “abihtj groups ” Groups 
formed upon the basis of some common interest, social matuntj , or other 
similar basis, are called “homogeneous groups “ An activitj in a progres- 
sive school, although made up of pupils of \arjing abilities, is certainly 
homogeneous from the standpoint of the objccti\o sought Most of the 
criticism of grouping is directed against groups formed on the basis of 
abiht} Doubtless, nobody would desire a group possessing the maximum 
degree of heterogeneitj , c\cn in intellectual abihtj, and certainly not in 
chronological age, phj sical matiiritj , background, motiv ation, and the like 
It IS probable therefore, that eieryhody umtts a gronv it'tlk a certain degree 
of homogeneity The differences ansc regarding the degree and basis of the 
homogeneity ” 

Arguments for and against ability grouping. An imposing list of a 
dozen or more arguments for, and an equal number against, ability group- 
ing has been assembled ” The cmcial point at issue is Do groups formed 
upon the basis of ability aid or hinder Icannng^ Among the alleged advan 
tages it IS argued that ability grouping makes it easier to adapt instruc- 
tional materials and methods to the individual pupil, thereby stimulating 
bnght pupils and encouraging dull pupils, i\ath the result that achimement 
IS increased and failure reduced Among the alleged disad\antagos, on the 
other hand it is argued that the Qstem is csscntiaDy undemocratic and 
that any gams m academic achiesemcnt are likely to be slight m amount 
and purchased at too dear a price, smic the bnght pupils tend to graduate 
too young and to dev clop a sense of supenonty , v\ hile dull pupils may ov er 
work or may develop a sense of infenonty Here as always, however, it is 
impossible to decide a scientific question merely by counting the arguments 
pro and con or by attempting to weigh the logic or ferv or with which they 
are advanced Fortunately, on this problem a considerable amount of ex 
perimental work has been done although most of the studies must be char 
actenzed as madequate and inconclusive 

The experimental e>^de^ce Sumraanes of the expenmental literature 
relating to ability grouping have been made by Billett,” Wymdham,^^ 

^ Cf Henry J Otto Flementary School OrgaHizalmn and Admtmstralion (Second 
Edition) page 184 New York D Appleton Century Company, 1944 

« For rather complete flummane- of the a^menls sec Austin II Turnej, The 

Status of Ability Grouping tducafional Admimstrahon and Supermnon 17 23 Janu 

ary 1931 ^tnlh Yearbook of the Department of Su}mntcndmce page* 121 126 \Va«h 

ington D C ^atlonal Education Association 1931 and Frnest W Tiegs TefU and 
Measurements in the Improverr’^t of Learning pages 262-2b4 Boston Houghton Mifflm 
Company 1939 

» Roy O BiHett op ett pages lb-37 

“Harolds Wyndham Ability Grojptng pagev ’28-159 Melbourne Australia Mel 
bourne Unuer-sity Pre** 1 >34 



crASSinc lTIO^ and promotion 


Cornell,*' nnd rinons nntcrs in the Renew oj I ducalwnal Research 
In 1931 n forciBn ol)'cr\cr“ commented upon the ' haphazard condition’’ 
of the re carch upon the problem and pointed out that the experimental 
itudics “rni'C more I'^ues than thej lettle ” , , , 

Ten tears later an American educator" sail little or no solid, objec 
„/c c\ deuce upon ah, cl. to base a decision as to the eftectiveness ot 

“Uy’hisb^^ 

learners at et err let el f B 

rductiou ,n the eonclusion is as follows " 

‘‘rather consistcntlj reported tier n.ia , , , , 

„„ J,.ncnd less upon the fact of grouping 
The results of abilitj grouping rouping the accuracy with which 

Itself than upon ended the dilfercutiations in content method 

grouping IS made for the P''™”'” " d,er as uell as upon more general environ 
and speed ami the ^il studies have in general been too piece meal to 

mental mlluences *'“,„h,„„,,rtud 

afford a true cv aluat.on o both 

well idaptcd to further the adj ^ (b^ grouping 

ohjcctuc nnd "u jec n reasonably 

Iho above statcincnt is nor y opponents need 

clear that the evd b^r hand that the alluring advantages claimed 

not occur, and, on the o" 1 „ other words there is no monej- 

l,v Its advocates may not “ best it merely adords more fa 

hack guarantee mth ah 1 ^ ® ^l^ng about the problem of individual 
vorahle conditions for doi g „ust be in terms of properly 

differences The ,a'chmg methods On this point Otto sajs 

differen tinted curricu classification scheme can remove the need for 

All nuthonlies are rnethods to the varying needs of pupils in 

adjusting mstructioual roaten 

the group ” unfortunately, there has been little or no 

Uuoii certain important issues u 

_* , Ahilitv Grouping Determinable from Publi^be^i 

nrthel L Cornell Soaely for the Slurly of EduaU^m Fa^ 

SludM ra.r(sW''‘>'f5,"TuloB Public School rubliBlungCompsaj ITJ) Quoted 
9RO-n04 Bloomington 

r/;™luon of volume I 193. 

page 290 ‘.ew Vort 

Wicmi Book Componj y ^go Quoted by i^muMioa of Ih* Eoeiery 
» Ilml P-P” by pcraussiou of the Soc-ly 

:ZjroZ\^ cl page 190 



100 Ml i&llll '// \7 IN n’SIRUCUOV 

-xpenmentition J^o one tor example, has determined the effect of x anoiis 
methods of adapting ^^ork to pupils of different le^cls of abilitj This is 
especially important, since the methods actunllj employed have usuallj 
been most effective for dull learners and least effective for bnght learners 
In most cases, probably, the methods have been those ^^hlch are used vith 
ordinary heterogeneous groups, and which appear to be least appropnatc 
to the more capable individuals 

^ or has there been any con\ incing cxpenmental attack to determine the 
effect of abihtj grouping upon the work habits and mental health of the 
pupils Such meager results as do exist are fa\oriblo Mailer” found evi 
dence that such desirable ‘social traits as co-operation w ere dev eloped better 
under a sjstem of ability grouping It is a common observation that the 
best competition in sports, such as golf and tennis, is among those who 
“play about the same kind of game ” Additional cv idence that homogoneitj 
IS an attribute of natural social groups is afforded bj the numerous studies 
which have shown that there is a positive correlation between fnends of 
all ages as well as betw een husbands and wiv cs, on practically all person 
ahty traits investigated Partndge** points out that several studies have 
revealed a greater similarity among fnends m mental age than in chrono- 
logical age But the mam reliance so far has been upon questionnaire 
studies, of which the most extensive is by Sau\ am One study of the atti 
tude of 645 junior high school pupils toward ability grouping came to the 
conclusion that the great majontj are happy and satisfied and that 
the> accept and believe m the grouping that exists as the best situation 
for them ’ ” That the opinions of parents as w ell as of teachers were favor 
able to abihty grouping m the cities where it was emplojed is indicated 
by the following conclusions ” 

On the whole where grouping is u^ed parents behev e that children are at least 
as happj do better work in school and arc correctlj sectioned according to 
ability 

Teachers seem to like abilitj grouping somewhat more than do the parents 

Thej bebe^e that grouping improves social attitudes leads to better work bj 
pupils and increases the happiness of children 


» Julius Bernard Mailer Codperatton and Competition page 163 New York Bureau 
of Publications Teachers College Columbia Universitj 1929 

Helen M Richardson Studies of Mental Resemblance between Husbands and 
Wives and between Fnends Psycholoffiad BulUtin 36 104-120 February 1939 
« E DeAlton Partndge Social Psychology of Adolescence Chapter V New York 
Prentice-Hall Inc 1938 

W alter Howard Sauvain A Study of the Opinions of Certain Professional and ^ on 

Professional Groups Regarding Homogeneous or AbxlUy Grouping 151 pages New York 

Bureau of Publications Teachers College Columbia Universitv 1934 
« Austin H Turney and M F Hyde The Attitude of Junior High School Pupils 
toward Abihtj Groupmg School Remeu, 39 606 October 1931 
• Walter Howard Sauvam op ett pages 115 116 



CLASSIFWATWX AND PROMOTION 


361 


The technique of nliilily grouping. There is no general agreement 
as to the best basis for ability grouping. In fact, there is probably no one 
"best basis." Much depends upon the local conditions, the data available, 
the nature of the subject, the size of the school, the fundamental philosophy 
of the .school, and the like. It is often tnic, as one writer suggests, that "the 
smmdc.st policy in dealing with educational measurements is to obtain 
objective data anil interpret them sidijectively.”“ Nor is there uniformity 
in either theon- or practice reganling the number and size of the groups, 
the proper differentiation in methods and curricula, or the relative empha- 
si.s upon nccelcralion and enrichment for the bright groups. 

\ useful distinction is made between vertical and honzontal classifica- 
tion. Vertical cla.s.sincation attempts to bring together pupils of approxi- 
mately the same sloliis. The successive grade levels of the ordinaiy school 
repreint such an attempt. The basis is usually C.k, or some combination 
01*0 M \ and R\. The use of the average of the JIA and EA, or the 
averagcG-score on an intelligence test and a general 

much'^.o commend i^VVo’^J^fte rSil^ 
classincation ™ For thfs ability grouping in the 

according to ^ <,„„bination of IQ and CA, is probably 

ucadcmic „ t„o-wsy distribution of IQ and 

most often employed. Cojc ' ^ effectively for 

CA, divided by horizoatal sometimes better than 

this purpose." It. the 'be purpose is to bring together 

general intelligence tests^ represent appro.ximately the 

for instructional purpo-cs the P P progressing 

same educational and rncntal „ gbould be flexible enough 

in the subject at about the s.ame rate The syste ^ 

to permit the shifting of pupi j „piy classified in that subject. For 
whenever it is evident they are P n,usic, for extracurricular 

non-academic subjects, otc i high school, the groups may 

activities, and possibly for the to ^ ^ 

be as heterogeneous '•’" P P g ,5 ^re of necessity limited to informal 
is inherently democratic. Small sch 

groupings made within the clararoom.^^ adjustment yet remains for the 
, But the most ™P“'’‘""‘ P . (be individual pupils m her class, when- 

classroom teacher. She must ^ “ temporary groups for remedial m- 

ever necessary must *”f=*™Jtional materials and teaching methods 

struction, and must vary analysis, the adjustmen o 

as conditions seem to warrant. Thomas Nelson 

--lilies. Orleans, 

erwS M- 



362 


MEASVRFMENT IN INSTRUCTION 


school to indmdual differences becomes a teaching problem As i^fcCall 
says, “But after all ho\\ pupils are taught and not how they arc grouped 
IS the \ntal matter Tumej puts the matter concisclj 

The actual sectioning is but a minor part of abibtN grouping the real job rests 
with tlip teacIifTs To adju«t eubject matter «o that a child can u«c his mental 
ability and to adjust method 'o that he icill use it — thc«e are the outstanding 
problem's for it is idle to talk of effcctue dc\elopment unless children can and do 
u«e their mental ability 

Special classes arc sometimes formed for pupils at tlic c\trcmes of the 
distnbution although in high school tho«c for the \cr5 slow learner are 
much more frequent than those for the verj bright ^ In such classes the 
teaching is highly individualized Patience, skill in diagnosing pupil diffi 
culties and training in mental hvgienc arc important qualifications for 
teachers of slow classes High intelligence, versatility, ^ound scholarship, 
and a thorough grounding m psjchologj are essential qualifications for 
teachers of special classes for bnght pupils 

It IS doubtless possible to ov erdo the idea of “specml” clashes and schools 
of one sort or another Although m a real sense cverj pupil is unique and 
should receive special attention, it would certainly be a grave mistake to 
become so occupied with the “exceptional” pupils as to overlook adequate 
educational proi ision for the larger group, w ho are, to all intents and pur 
poses ‘ perfectlv normal ” This situation has been satirized follows “ 


Johnnj Jones has lost a leg 
Fannj s deaf and dumb. 

Mane has epileptic fits 
Tom s ejes are on the bum 
Sadie stutters when she talks 
Mabel has T B 
■Morns is a splendid case 
Of imbeciliU 
Billj Brown s a truant 
And Harold is a thief 
Teddj s parents gave him dope 
•tnd so he came to gnef 
Gwendoline s a milbonaire 
Gerald js a fool 

So everj one of the=e darned ki Is 

Goes to a special school 

Thej ve speaall> nice teachers 

And special things to wear 

And special time to olav m 

And a special kind of air 

Tbej ve special lunches right m school 


** W jltiam A McCall op cil pa»'e 168 

» Roy O IhllettZ'^at °pago 196^ '*3-115 Quoted by permission of the Soeietv 
FarrSTkmeS"l',;e ‘‘ioS' 'Mcru Edu^heu page 021 ^clv Volk 



363 


CLASSIFICATION AND PROMOTION 

I— It makes me nild!- 
I ln\cn’tnn^ ‘'pemlhes. 

I’m iu«:t a normal child. 


Ucclcrat..,„ and I„ the elcmenta.y school a common 

douce for redueoK the hctcrogcncitj of tl,e class is to eliminate the ex- 
tremes of llic (hstnlmlion at promotion time To do this a small number of 
the most capable pupils arc allowed to “skip” a grade or half grade, and 
unmlly n larger number of the least capable pupils are “failed,” or “re- 
tained” in (he 'uamc grade for another j car or half j ear Witty aiid’wilkjns*' 
published a critical survey of the literature relating to acceleration, and, 
in spite of certain limitations m the studies, concluded that “most reports 
«hou clr.arJy that acceleration, when practiced, is as«ocmted with desirable 
adjustment in all types of development for which data have been as- 
sembled ” One experiment ii» which pupils allowed to skip a grade were 
paired with pupils of like ahilitj not skipped led to the conclusion that 
“under reasonably favorable conditions skipping is a satisfactory method 
of accelerating pupils of superior ahihtj ”” 

Recent studios Invc attempted to determine the effect of acceleratior 
upon the pupils' personality and social adjustments in high school and 
college, apparently accepting Teiman's verdict regarding the academic 
achievement of superior pupils “The earlier they enter college the better 
work tlioy do there, at least down to an entrance age of 15 years 
.VImost without exception the results appear to be favorable Engle, for 
e.\ample, found that accelerated students m high school when compared 
with other students of their own chronological age were “at least as active 
socially as non-acccicrated students In 1943, Prcssey“ survey cd the lit- 
erature regarding acceleration on the college level and came to the conclu- 
sion that “the great majority of accelerated students do well in school, arc 
socially adjusted, do not suffer m beulth, and are not handicapped in afU r- 
school career ” 

Duniig WorM War II great emphasis was placed upon accelerated pro- 
grams of education, particularly on the college level Several colleges have 
attempted to investigate the effect of these programs upon the students 


» Paul A Witty and Laroy W ndltira “The Status of Accdcratiou or Gride hkip- 
ping Admimrtratrve Ptartite," Ed^.m.1 A*uu.rlortu,u 19 

E^Adaras aod C C Ro»s ‘Is 

^ u His Academia Eavnnnn.ent, AtkW 

and Sa^ls, 49 b% Acceleration upon the Per 

.ouaWy anTsoenl Adjurlnionts of High School and University Students » ./ 

™l“'rCsS“”celLt’;'^LSSs.ep,‘‘ Edoeationol R„earc^ Budd.n, Td 
3o, February 17, I9‘< I 



MEASUREMENT IN INSTRUCTION 


Studies directed by Pres'Cj " at Oluo State Unn crsitj ha\ e been especially 
noteworthj With few exceptions the results have favored acceleration 
Another inv estigation“ concludes that many more supenor women stu- 
dents than usuallj attempt it "ean eomplete a college program in three 
jears or less without unfortunate effects as regards scholarship, recreation, 
health, or after-school career ” 

In a comprehensi\e, heaMlj documented monograph Pressey summa- 
rizes the literature on acceleration and calls upon educators to break the 
lock-step curriculum at all lc\els for the abler students**® 

There is eMdence to suggest that «upenor children might best begin “school some- 
what \ouiiger than a\erage children — tliat school entrance should be on the basis 
of total matunh rather than 'impU on am\al at the chronological age of six In 
elementary and secondary 'chooK, e\en though accelerates ha\c u^uallv not l)cen 
selected carefully on any broad basis or helped to adjust to the more advanced 
work and new companions, a majontx of them ha\ e ne\ erthele«s continuetl to ehow 
academic supenontx and good >50cial adjustment after acceleration accelerates 

ha\e been carefully cho'en as intellcctualh superior and well adjusted, means 
prouded for faciUtating adiancement, as b\ ‘ulju'^tment clas«e«, or “sections ar- 
ranged for «upenor student^ who then mo\e foniard together to cover perhaps 
three xears of junior high «chool work in two, ro>ults have been found almo«t 
distre« mgly «ati«factorv Such «tu<Ients do well in later pubhc-«cl)Ool work (as in 
“^emor high “school) and m their relitiondaps with jiupils in tlie cLisscs into which 
they have been advance<l, «ometimes even lictlcr than students of equal ability 
and academic record at the time the “special program of acceleration began who 
continued at the regular pace Tfe concUmon 9eems unmoidable that the usual loci 
step grossly xrasies the lime of Ue ablest young persons In college «tudents who 
entei^ earlv as a result of acceleration «imiUrlv «hovv the highest academic record, 
the lowest academic mortahtv, and high participation m campus life 


The weight, both of the arguments and of experimental evidence, ap- 
pears to be against failure or retardation as a school policy In Otto’s® 
surv ey the literature mchcated that about 20 per cent of repeaters do better 
and 40 per cent do worse than before He concluded that if the objective 
of the modem «chool is the optimum development of its pupils, “non- 
promotion IS not the way to get it ” Several studies indicate the value of 
trial promotions \n investigation by ^Iclvinnev for example, involvnng 
more than 13,000 pupils, shows a saving of about three out of every four 


” ^ n ^ Prra«e> and S B Folk First Evaluations of an Accelerated Program 
Vlf, c T ^ugmeenng Educahon 34 477-48o March 

‘ ^<^ce!eration the Hard y\ax,’ Journal of Educational Research 

37 561-070 April 1944 

«ManeA Flraher An IntensweStudy of Seventy-Six Women Who Obtamed Their 
^ N Edimtwnal Research, 39 

” AccdaaUon, Appratsals and Baste Problems (Bu 

Educational R(»eareh Monograph ho 31), page 143 Columbus, Ohio Ohio 
State bniversitj 1949 Italics added 
** Henry J Otto op cil page 232 

" H T McKinnec Promotion of Pupils A Problem in Educational Adminustratton 
206page« brbana Illinois Lm\er«ity of Illinois, 1921 



CLASSiriCATIO\ AXD PROMOTION 


36o 


apcalcrs One stiidj Ins shown that the threat of failure affords meffectne 
motnation'" Ccrtainlj, with a modem curriculum and an adequate pro- 
gram of diagnosis and guidance fen if anj failures should occur 

Continuous promotion Otto“ has proposed a somewhat theoretical 
but \ er\ sllggcstn c promotion plan for the elementary school, which abol 
ishcs not only acceleration and nonpromotion but the term ‘ school grade 
as well Such a type of organization has been in successful operation in 
set oral school sy stems for a numlier of years 

Ills plan nnobes the following fnc es-ential features 


1 There would be mailable extensile data of an objective ^aracter on 
eneb child so that he may be placed at all times m groups in w hich he can 
O I m be be l adiara^^^ m terms of Ins own developmental readiness ’ 
'' n There w mild be continuous pupil adjustment and progress ® ^ 
from onrgroup to another at any time dunng the year that a change 

" 3 ' Tholtr'ctffications which tahe place m the ordinary school at 
the beginning of each term ^ 'eMhcrgroiip relationships in which 

. I ml h the^me granp of children for two or three 

conseciitii e scmcstera or y cars ..stem w ould be replaced wath 

doiclopmcnt of each child 


SELECTED RErECEVCES EOa rDUTBEB RoiniNO 

zxnt in Grading Practices for Air Traimng Com 
BerWiire James ^ Air Tm.am, Coia™nd 

maud Schools ATIi . jgj2 39 pages 

Scott Air 1 orce Base IHmovs . the racililation ol Learnia, 

Cooh BaltcrB T'’'yXr(Ba^^ “ 

Dc'^Am “.can Council on Moration^l^^^^ 

Crow Taj.ter D and Crow tlwe Educational Implications of In 

'''bosbamrconipany «« and Sen EUmmation from School 

^SrCT^lSMar., ^ of Ifdn-mnn. 

, juamlhChrcalofSa, 

— rt \Mhv A Attempt to Evaluat 

« Henry J Olio op «' 



366 MEASUREMENT IN INSTRUCTION 

Hildreth. Gertrude H , Edueahng Gifled Children al llunkr CoU'-ge IJcmrntarn 
School Ne^\ York Harper A. Brothers, 1952 272 page*! 

Holbng«head, Augustus B, EUntovn^s louik, the Impact of Social Classei on 
Adolescents Nei\ Y’ork John Wilej A, Sons, 1949 Chapters 8 and 13, “The High 
School m Action" and “LcaMng School " 

Mackenzie, Gordon N and Bebell, Clifford, “Curnculum De\clopment," Renao 
of Educational Research, 21 227-237, June. 1951 
Pres'ej, Sidnej L, Educational Acceleration (Bureau of IMucational Re‘-earch 
Monograph Xo 311 Columbu'* Ohio Ohio State Unucrssitj, 1949 153 pages 
Terman, 1.6^ is M , and Oden, Mehta H , The Gifted Child Groics Up Stanford, 
California Stanford Unnersitj Pro^, 1947 14S pages 
W ittj, Paul (Lditorl, r/ie Gi/ted CAi/d Boston D C Heath and Company, 1931. 
338 pages 



Evaluation in Guidance 


The field of guidance de\cloped rapidlj from 1018 onward, concurrently 
with the beginnings of mass group testing Now a \ast body of information 
nnd techniques must be mastered before one can qualify as a competent 
profosfsioml guidance worker, so it is not feasible in an elementary measure- 
ment tc\tbook to do more than make a few brief remarks concermng eval- 
uation in school guidance programs The interested student tnll rvant to 
scan the selected references at the end of this chapter for supplementary 
reading material 

A. TIic iMcaniiig and Importance of Guidance 
The fundamental problem of life is adjustment At birth the human 
infant is much less well adjusted to the world m which he must live than 
many of the simpler organisms Man’s dominant place in the umverse is 
due largely to his remarkable capacity for modifying his reactions in the 
direction of a more adequate adaptation to the conditions under which he 
must h\e The process b> which these changes take place is called learning, 
and the result is called ediicalton Tlie function of the school is to provide 
a fa\orabIe environment in which these changes may take place The role 
of the classroom teachers and of the school administrators is to stimulate 
and to direct the learning process 

The aim of all guidance is to assist the learner to acquire sufficient under- 
standing of himself and of his environment to be able to utilize most intel 
hgently the educational opportunities afforded by the school and the 
community The problem of guidance arises from the fact that an immature 
but growing individual with a unique combination of abilities and limita- 
tions IS confronted with a complex and ever changing environment Guid- 



3CS MEASURhMl M IS'Sl RUCTION 

ance used to be legardeel as an effort ‘ to see through Johnny “d to 
Johnnj through ” The emphasis todij has shifted 
Johnny =ee through himself and to see himself through It seeks to assist 
each student to choo=o, and make satisfactorj progress in those actratiis 
which will contribute most to his decelopment inditidual happiness, and 

^ Certain circumstances has c conspired to make guidance one of the most 
unportant respoiisibilitiis of ttic modim school This is particularly true 
of the secondan school and of the college Tigurc 12 on page 248 shows 
that in 1000 4 poi eint <if aW atuclents in the tinitwi States %\ere enrollea 
in secondarv ^rhooh and 1 4 pu tent in institutions of higher learning 
while m lOd'’ tlio rorrt^i mdin^ percentages were lb 1 and C Z During this 
same period tin tot d nuinhcr of students doubled nsing from 17,200 000 
to 34 400 001) ^s a rcaiilt of Ihcst comhtions the student body of the 
modem ‘•pf'ond'iry school and college represents a greater divcrsitj of back 
grounds interests ambitions and ahililn s than has c\ er been true before 

At the s.im( time scicntt and indention have greatly complicated and 
are constaiitlv ehanging tlic social and economic world from which these 
pupils come ami to whuh they must return Likewise, the school situation 

it«elf academicallj as well as socmH> has markedly increased in comple\ 

itj The small high school with a single eumculura leading to college has 
tended to gi\c way to larger schools with a more diversified program 
Judd* called attention to the fact that the number of subjects offered in 
Amencin high schools increaHid from 9 in ISOO to 1001*6 than 250 in 1042 
At the present time a pupil of high school age in a largo modem American 
city has a clnm c of a score or more diffi rent i urntula 
As it is always easier for the traveler to lose his v\uj in a large city than 
m a small town especially if he is lacking in maturity and experience it is 
perhaps not surpnsing that many of those who enter the modem sccondarj 
school and college never succeed in making a satisfactory adjustment to 
these lastitutioiis Likevsnse the vast number of adolescents and adults 
who find their wa> into penal mstitutions or into hospitals for the physi 
cally and the mentally ill anti the much larger number of others who lead 
unhappy and unsucce'isful lives afTunl tragic evidence that adjustment 
outside the school has }jeen equally unsatisfactory T here seems no escaping 
the fact that when the condition^ ot life increase in complexity, the need 
for gmdauce increases jiroportioi alcly The better the guidance program 
the less will be the need for di ignt stic and remedial w ork later on An ade- 
quate guidance program is the best fonu of prevention 


‘Geort^pE Mvcis Pnnaj lai atul Teehm^uet „f \ oraitonal Gnvlance page 4 Neff 
York MpQraw Hill Book C mpanv 1941 

* c bar! >s II Jud J ( oral Lducation end the Baccalaureate Degree School and 
Soetely '’0 3o July 11 1942 



EVAWATION in guidance 359 

B. The Place at Sleasiircmciil m Guidance 

P'-*" ^measurement m guij 
nf ^ J f common non than in the early *3 3 

of standard testing, is to think of guidance as sjnionymous noth testing 
ff nirfancc ,s more than the grvrng of teste, no matter how extensnely or 

care/ally done U a matter of fact, nhetlier or not tests sen e any guidance 
function depends upon the use made of the results Here, as elsenhere 
teats are niercl3 tools The second error, very common todav is to dismiss 
mcoslircmeiit altogether and to regard it as nholly unessential to guidance 
if not indeed an actual obstacle This iien point is as evtreme as the first 
honever tVliile testing is net cr eveiy thing in guidance it is almost aliray s 
something In fact, it may be asserted confidently that eialuation in some 
form IS rmptiat in the guidance function Properly used tests are valuable 
aids to 'olf-analjsis 

Fowler* indicates that many common mistakes will be a\oided if users 
of tests in guidance will remember at all times that ‘ the only justifiable 
rcison for using tests in the guidance program i<? to «e^^e the individual 
in\ ciitory in counseling ** He formulates seven "guiding rules" based upon 
this point of view 


1 An> item of the uiduiduil luvcntorj, v^llelhe^ it be a test score, a teacher 
mark, n fact about the pupil s health can be interpreted in the couoselmg situa 
tion onl> in the li^ht of all the other m\cQtoi> data having some bearing on the 
problem at hand This is to sa>, a chief value of test 'cores is the check wh ch 
thej provide upon the meaning of other accumulated facts In turn the importance 
to be accorded test scores in an> given case must be weighed in the light of other 
data from the mdmdual inventorj Dependence must be placed upon tests to 
supplj facts when thej have not been accumulated through other means 

2 Test 'Cores, like other items in the mventory must be interpreted cautiously 
until norms are scientifically established hr the local situation and for tlie par 
ticular kind of problem which the pupil presents 

3 The meanmg of a test score may not be the same from one pupil to another 
because of the differences m other pertment inventory data The meaning may 
change ev'cn for the ^a^«■ pupil from one problem to another or from one time to 


another » / 

4 Rea) counseling will encourage decisions or judgments only on the ba«is oi 
as full an inventory of pertinent facts as possible Thus sev eral measures are usually 
belter than just one or tiro Likemse the same dependence will not be placed 
upon so-called ‘ interest’ or ‘ personality tests as upon achiei ement and aptitude 

It 13 recognised that certain tests are reguhrly used mUw school by the 
administrator in pupil classafication and curriculum planning The} 
teachers in mdlviduallsing leaching meOiods The data from these rame te s are 
of even greater use tor counsehng and should idwajs be recorded m >«• '■““f 
record Tests used by the administrator for these purposes ma} supplement the 


■FredM howler To Inquirer, about Tests /aamtoa/or I rrfory 3 12 Id De- 
cember 4 1044 



ilEASURhMENT IX JXSTRUCTION 


leste used onlj bj the counselor This fact "liouW not be mcriooked m their 

chOT couiwdinj rather than ns standards for nrbi 

trars selection (or rejcctiont (or training and joli opportunities 
7 Familiantj noth a test gamed through its use, is important In deciding to UM 
a neir test to measure the same traits loss of this familiantv should bo neighed 
^refulli against the po<'Mble gam m rcliabihU, \alidit>, usability, and cconomj 


If the'^e suggestions are kept in mind, the dangers against ^\hich Rogers* 
w ams ^vlll be a\ oided 

Counseling is the process of assisting the indi\idual m making the maxi- 
mum adjustment to the educational op|)ortunities of his environment in 
terms of his abilities, interests and needs It is normally a face to-face 
relation between an older, more cxpcncnccd person and a less mature 
person It is an example of co-opcrali\c problem soUing The role of the 
councilor is not to make decisions for the pupil, but rather to help him 
to soU e his owm problems mtcUigcntlj The pupil’s part naturallj increases 
mth his maturity, the ultimate purpose of counseling being so to de\elop 
the pupil’s self reliance that outside help becomes progressively uuueces- 


«ary 

Counseling is not a new development m education In some tyTic or other 
It has probably existed longer than formal education itself The difficulty 
has been not with the amount of counseling ax ailablc but with its quality 
Nowhere is the wisdom of the homely American plnlosopher, Josh 
more apparent than m counseling “It is better to kno less, than to kno so 
mutch that ain’t «o ’’ Tlie improvement of measurement technique*' in the 
present century and the development of cumulative record forms have 
made it possible to substitute factual data for opimon and hearsay 


C Guidance Is a Co-opcratue Vcnlurc 
In a very real way all persons m contact with the child — teachers, 
admimstrators specialized educational personnel, clinic staffs, parents, 
ministers, law enforcement officials, and others m the community — are 
guidance workers Thev must work together effectively m order to pre- 
vent juvemle delinquency, inadequate benefit from schooling voc^tmnal 
frustration neurotic inadequacy, and mental illness This calls for continu- 
ous co-ordinated effort by every one No one person or agency can shoulder 
the burden alone, for no single group has the necessary training, experience 
and facilities Legal, medical, social work, economic, and political consid 
erations often loom fully as important as purely school based educational 
aspects T he teacher should remember eternally that his is a co-operative 
task where guidance is concerned Though it is undoubtedly true in a cer 
tarn sense that “teaching is guidance,” much more guidance than can be 


« Cart R Rogers Tsj cDometnc Tests and Client-Centered Counsemig,” Educoitona 
and Psyehologtcal Measuremenl 6 139 144 Spring, 1946 



i:v ALU AVION IN GUIDANCE 


Bu cn in the chsbroom under prasent conditimis 
clnlUrcn arc to dcvcloji optimally 


IS Cbsenfidf if Amenta's 


S’EiimD Tlritnr\(’Es for TonTHfR RrADwa 

Add, son 

Bonnett Goo^c K , and Se4<,rc, Ilarnld, “Tashns for Vocrt.nnal G.ndrncc m 
r Chaunce> (Chairman) 1947 Iimtational 
“^:Tp^o^^gf^dlv^rfuar Differences' American 
Lounal on Lducnlion Stuates, .Smeg I, i\o 32, Vol XII, October, 194S 
Bennett, K , Seashore, HaroW G , and B'esman, Alexander G , A Manual 

for Vie Differential Aptitude Teats (Second Ddition) New York Psvcbolo^icaJ 
CorpoMtion, 1952 77 pages “ 

Blwlsoc, Joseph C , "Success of Non High School Graduate GFD Students in 
Tlircc Southern CoIkgcs,’’Cof/<'ffc out/ (/micrsi/y 29 J 81 -J 88 , April, 1953 
Coleman, \\ ilium, and Cohb, E B, Guidance Use of Test Results Professional 
Manual iNo 2, Tcnncs«ce Suto Testing Program Knoxville, Tennessee Em- 
\cr^it} of 'J e/3nes?ec, Noi ember, 195J 47 p^ges 
Cranford Albert Beecher, and Bunibam, Paul Silvester, Forecasting Colley 
Acfiiaement ASuneyof Iplifude Tests/or /fi^jAer Educotion, Part I New Haven 
Yale University Press 1940 291 pages 

Darlej , John G , and Anderson, Gordon V , "The Functions of Measurement ui 
Coun«ehng,” Chapter 3 in E F Lindquist (Editor), Edueahonal Measurement 
Washington, D C Amoncan Council on Education, 1951 
Dav IS, 1 rcdenck B , Utilinng Human Talent Washington, D Q American Coun 
cil on Educition 1947 85 pages 

Dans, Fredencii. B /Editor), “EdiicatJODal and P^chological Tesjjng ” Renew of 
Eoucatiorwl Research, 23 1-1 10, Februarj 1953 See reviews of the 1949-52 
literature concerning tcst« of general mental abilitj (Julian 0 Stanlej) teste of 
special aptitude (W ilhain G Mollenkopf), nonprojective tests of personality and 
mterest (David V Tiedcman and Kenneth W Wrison), and projective tests of 
pensonaht^ (John TV hi Rothney and Robert A Hermann) 

Enckson, Clifford E, The Counsfitn^ Iniemew New York Prentice-Hall, Inc, 
1950 174 pages 

Fredenksen, Norman, and Schrader, W B, The ACE Psychological Examination 
and High School Standing as Predictora of College Success ” Journal of Applied 
Psychology, 3G 2GI-265, August, 1952 

Froehch, Clifford P, Guidance 5«Ttcc« in Am<rffer*ScAo(?/s New York McGraw-HjJl 

Book Company, 1950 352 pages 

Froehheh, Clifford P , and Durlev, John G , Studying btuaents Guidance Methods 
for Individual Analysis Chicago Science Research Associates 1952 411 pages 
Lefever, D Wolty, Turrell, Archie M and Wcitzel Henrj I , Pnnaples and Tech 
niques of Guidance New Yoik Ronald Press, 1950 577 pages 
Rogers, Carl R , ChenirCenUred Therapy, Its Current Practice, Impctcations. and 
Theory Boston Houghton Mi/Hin Company, 19ol 560 pages 
Rothnpy, John W M , and Roena Bert A, OmAmc N j’^frran YouA 
reentai Stud,j Cambridge, Massadnisette Harvard Univeraity Press, 1950 
269 pages. 



372 Mr [SlRFMrXT IN INSTRUCTION 

Rothue\ , Jolm W M , and DanieLoa, Paul J , ‘ Coun'^ling,” Renew nj FJucoUonal 
Research, 21 132 139 Ap^], 1951 

Ptothne\ John \\ M , "Interpreting Te^t Scores to CouD«eIees,” Occupuliora 30 
320-322, Februar\, 1952 

Strang Ruth, ‘ Major I-imitalions in Current E\aluation Studies," Educalxonnl 
and Pejdologxcal Measuremenl 10 531-536 Autumn (Part 2), 1950 
‘^trang Ruth ‘ 7 Mai s to Injproir the Rating Process,’ Ocevpahons, 29 J07-I10, 
N 01 ember, 19o0 

Super Donald F , Appramng I ocotionol Fttnese by Means of Ps jchologicat Te*ls 
\en ^ork Harper i. Brothers 1949 727 pages 
Tra'^ler, Arthur F , and Totin'cnd, \galha (Editor*), Improving Transition from 
Scl ocl to CoUtge Nen Aork Harper and Brothers, 1953 1 Co pages 
Millev, Ro\ D , Guirfancc in Flemen//rrif Eoucn/iofi Xcw\ork Harper and Broth 
ers, 1951 82o pages 

M iH i am son, E G and FoIe> J D Courwehn^ end Discipline New lork 
McGraw Hill Book Compan\ 1949 3S5 pages 
Wolfle, Dael, and Oxtob\, Tob\, ‘ DLtnbiitioiLs of Ability of Student? Specializing 
m Different Field-V’ Snence 116 311 314, September 26, 1952. 



Evaluation of Schools 


A. Tlic Prolilcm of C^nhiatioii 

TTio motuationil effocts of evaluation programs have long been evident m the 
pchools If achiev emcnt is rated bj tests, both teachers and pupils w ork to pass the 
tesU If orogrc's is appraised m other navs, activities related to these methods of 
cvaUmlion are evident m the dail> or vveeklj school program The modem point 
of V levv IS that ev aluation is not a senes of penodic examinations apphed externally 
but an mtnnsic part of the learning orocess with its planning, evaluating c>cle 
Tests do of course, have their olaces m this process Viewed thus, methods of 
evaluation can be one of the most valuable tools for creatmg mterest and purpose 
in further learning ‘ 


Measurement and evaluation. As used m education, evaluation is a 
far mure inclusive concept than measurement Two aspects of evaluation 
may be distinguished (1) data relating to some important aspect of the 
school, such as its organization, program, or results, and (2) a set of values 
or standards against which these data are interpreted and appraised Fur- 
thermore, the evaluator’s educational philosophy and sense of values will 
determine what objectives of the school program he considers to be impor 
fant, as well as what data ae will look (or, or regard as reiei -mt in the 
situation It IS apparent that while measurement may be highly mechanical 
and at times a routine, evaluation can never bo, at ev cry st ige evaluation 
requires the exercise of mature judgment 

Measurement implies the use of some tool or instrument, such as a test 
or scale, and provides a quantitative description ol observed phenomena 


‘ Ernest R HiJgard and David H Ru-isell, ‘ Motivation ui S. l.onl Learning page t>4 
m the Fort^ Nwlh Yearbook of lU Naixanal Society for the Study of 
tLearning and Instruction) Chicago University of Chicago Press 1950 Quoted by 
Ot the So< ict> 

173 



374 


HrASURrVfNT IN INSTRUCTION 

This IS desirable but it should ne\cr c\cludc relcrant data of a 

subjective and qualitative character, or the consideration of outcomes not 
immediatel} observable Some unters have cnticizcd existing measure- 
ment m education for the reason that it furnished inadequate data for 
evaluation^ Atbest measurement mcrelj provides data needed for eval 
uation it IS not e^aluatlon per sc 

The Sixleenih 1 earbooJ * of the Department of Elcmcntarj School 1 nn 
cipals of the National Education Association presents a good discussion 
of evaluation on the elementary level The ten chapters of this report are 
gi\ cn below 

I The Fundamentals of School Vppmi'sal 
II \ppraising the School Organization 

III Appraising Administrative and Supervisor} Procedures 

IV Evaluating the Curriculum 

V Apprai«^ing Methods of Learning and Teaching 
VI Evaluating Socializing Expcnenccs 
"Vll ^Measuring the Progress of Pupils 
\TII Estimating the Efficiency of Teachers 
IX. Judging School Equipment 
X A Revnew of the Technics of Appraisal 

Note that the term "measunng” occurs in only one chapter heading fhe 
other terms appraising ” “evaluating ” ‘ estimating ” and “judging ' all 
similar m mearung imply the use of techmques that go bey ond testing and 
examimng 

The Eight Year Study of the Progressive Education Association * which 
included both elementary and secondary education and the Three-Year 
Study of the Commission on Teacher Education* on the college level are 
illustrations of an enlarged conception of evaluation The committee sought 
to devise suitable instruments of measurement for outcomes — such as in 
terests attitudes creativeness and vanous aspects of thinking — less tan 
gible than tho«e measured by ordinary tests and examinations It also 
utihzed other types of data such as anecdotal records family histones, 
records of the pupil s activaties and the like 

Possibly no more ambitious example of this enlarged conception of eval 
uatioD IS av ailable than the Cooperalu e Study of Secondary School Stand 

*Cf \ erner M Sinis Educational Mea«urein«*nt and Evaluation Journal of 
calional Research 38 18-33 September 1944 

* Appraiomg the Elementary-School Program The \ attonal Elementary Fnncival 
16 227-600 July 1937 

« Eugene R Sirath Ralph W Tyler and Evaluation Staff A’p-praming and Recording 
Student Progress 5o0 pages ^ew York Harper L Brothers 1942 
‘ Maunce E TrojerandC Robert Pace Eialw^iomn Teacher Education 368 
Waohington Anjencan Coimcd on Education 1944 



S75 


EVALUATION OF SCHOOLS 


.* ... J’lhTSi'.SS'' ■“ 




Method 


I Lraluitue Cntcm 
A EMueabona] J’jtJgriro 
Curriculum 
Pupil ftcbnli 
I ibnrj 
(tuidance 
Instruction 
Outcornca 

Ji Orginiaition and Plant 
Staff 

Administration 

Plant 


28 

28 

28 

28 

bO 

28 


10 0 
60 
40 


Per Cent 
40 

20 


20 


2 OeopralJudgmcnts hj \isitin8Coaimjttee3 
d Oron th 39 Measured by Standard Testa 

4 Sue<*o«9 of Pupils 

A In College jO to 1 

li Noneollegc OtoO 

5 Judgment bj Pupila 
0 Judgment lij I’aretita 


20 

20 

10 


6 


Total 


100 


Tlic importance of valuation. Without some form of evaluation 
e\ eo thing about education becomes a matter of blindly hoping that all 
IS w ell In the critical period shortly before the Civil War, Abraham Lincoln 
began an important address wth this statement "If tve could first know 
ttherc nc are and nhither tve are tending «e could better judge uhat to do 
and how to do it ” It is no less true in education than in government that 
^ve must first "know nhere ne are/' and especially "whither we are tend 
ing,” before ue are in a position to judge intelligently regarding “Tihat to 
do and how to do it " In the final analysis it is the function of all attempts 
it evaluation to afford a basis for rational action Apparently, educators 
ha^ e alwaj s recognized this, even if often somewhat vaguely For example. 


• Tor a full account of thia study see Ecoltteatm o/ Secondary St^oU General Rep^t 

52G pages Washington X) C CoopcrativeStudyofSecoad^School Standards 1939 

I or a briefer statement gee Eow to Foaluate a Secondary S^ooi (1940 Edition) 139 
pa„t8 W ashington D C Cooperative Study of Secondary School Standards 



MEASVRnimT IN INSmUCTlON 


a college president said “Sclt-cntiusm and self-appraisal viion a ‘self sur- 
ley’ or an 'e\ aluatioii’) are as old as cdm itioii 
ITie emphasis today is more and more niioii the importance of self- 
ecaluation This holds true for all letels of education from the actlMlv of 
the pupils m an elementary class passing judgment uiion the siieciss of a 
unit of instruction planned and eacilltid !>} themselves to the formal 
Report of the Har\ard Committee* 

Many years ago Thorndike pointed out that “the actual oliangos rouglit 
in bojs and girls by this or that form of education are being mcisurod, 
old and new methods arc being tested bj cvpcnmcnt m the same spirit of 
zeal and care for the truth that animates the man of science, and the edu- 
cational customs which ha\e been acceptccl unthinkingly by 'u-'O and wont' 
are being required to justifj themselves to reason Although it is probahlj 
true that more progress has been made in that direction since the statement 
above was written than m all the centuries preceding, improvements in 
evaluation procedures have hardly kept up wath tho'fC in curricula and 
teaching methods 

The difTicully of evaluation. The problem of evaluating education is 
immensely complicated Many approaches toward a solution have been 
made, and none has been entirely satisfactory Tor many years the various 
regional associations attempted to evaluate the secondary schools and col- 
leges of Amenca indirectly by their possessions rather than directly by 
their products Such measures as the size and qualifications of the staff and 
the number of books in the library were at best indications of cducaftonal 
opportunity, and even to a less extent were such things as the number or 
type of buildings in the school plant and the amount of financial support 
available The limitations of such a procedure have been characterized as 
follows The standards used were mechanical, rather than vital, rigid, 
rather than flexible, deadening, rather than stimulating traditional, rather 
than progressive, academic, rather than liberal, broadly compreliensive 
and subjective, rather than scientific •' Intensive and extensive study ot the 
problem by committees within the past decade has increasingly revealed 
its complexity The Cooperative Study of Secondarj School Standards, for 
**xample, extended over a penod of six years and cost about a quarter of a 
million dollars It employed the six major methods of evaluation given m 


Hen^ M Wnston A Critical Appraisal of Experiments in General Education 
^ Society for Ike Sludp of Educafton Pari II, page 

303 Bloomington Illinois Public School Publi<»lung Company, 1939 Quoted by per 
mission of the Society ^ 

va'd Cambridge, MaBsachvsetts Har- 

mZSSpaJ “‘'• 


641-«G1 ? 94 ? Evaluation ’ Joimol of Edumltonal tteieareh 

Eialmhor, of Secondary S<ihooh General Report op at oages o3-5o 



EVALUATION OF SCHOOLS 


377 


Table 37, and developed three scales, whose composition is given in Table 
38 It will be noted that the complete scale, Alpha, includes 110 different 


TABI^ to 


CoiirosiTios OF 1910 rumos or nir Auiu, Bwa and Gamma Scaifs for 
E\AI LATINO SlCONDABT SCHOOLS 



A umber of Thermometers 

Ar'O 

Alpha 

Beta 

Gamma 

Curriculum and louroe of atudj 

19 

5 

J 

Pupil ictivitj proRrara 



\ 

Librarj pcnicc 



ft 

Guidnicc aervi' c 



2 

In«lruclion 


7 

2 

Outoonu" 



b 

School Ftaff 



A 

Sthnol plant 



> 

School a(lniini‘'tralion 




* — 1 

110 

60 

25 

Totil .j 





show till 


./.t nil iclutine to the iime evaluative criteria of the fir^t 

l uri” d m mio ^rin 1942 Lunch, Puce, end Z,ogfoId.= .nnejed 
Te 1 temture Ot the Odd end enmo to tine comht..Dn “No ..ntpio and m- 
expo mo teohmciue In= aa jet boon dov.sed nor .3 one hkcly to hodct.aod 
Mnt Mil proxulo an evaluation ol an entire etlueationa program 
r oliiatina tcaeliing eincicncy. A single illustration i 

I tv of the piohhin of evaluation How can one host judge the vv orih 
‘ r'*’ Irtieolar classroom teacher? This is manifesllv an important t|ucs- 
° V large extent the selection, growth, and promotion of tcarheni 
d'cncnd“upon the answer In general, the methods used are of three type, 

I 11 firsl place, tuts and rating scales have been devesed for measuring 
i" ' enlilii of the teacher As a nile, these have provcxl disaiiiwinting 
Tbe'’<hfficultv with this approach has hecn clearly jiointed out hv AIcC'all 
“N one ha’s demonstrated just what causal relationship if arij, exist, 
belvvcen possession of these vanous altnbutes and desirable changes in 

'’"a ' swond method attempts to measure the worth of the Uaihcr hj Inr 
I.errormanu, usually her activity before the class For tin, purpose v anon- 
core cards and rating scales have been del i-ed In fact, tin t> pieal ralmg 
^ le attempts to secure various measures of the toatIi»*r’p pcrformscf^ 
together with measures of certain traits of personality dffmed imports^' 

SIS ^ 

“•^n v“Cm A McCall page 40! U,l Tin VI., „„llan Cve--^ 


378 


MEASVREMmr W mSTRUCTIOX 

m tcaclung But except as instruments of self-analysis by the teachers 
themsehes, the practical value of rating scales is slight Tor example, when 
the gams on the Staufonl Achiexemciit Test from Novemher to May by 
pupils 111 four Wi-consiii schools acre used as a critenon, the correlations 
With 17 of the bcst-kno\\^l measures of teaching ability available, although 
somewhat inconsistent, with few e\eeplions were so low that they could 
reasonably be supposed to have arisen from a population m which the true 
relationships were zero “ 

A third metliod has been the attempt to judge the worth of the teacher 
by her product, the performance of her pupils This is certainly the most 
direct, and is often asserted to be the only valid, approach. The most 
obvious way to achieve this result is to measure the improvement made 
by the pupils dunng a penotl of instniclion under the teacher, liut the 
problem is far more complicated than it at first appears Even when allow- 
ances are made for differences in the intelligence and initial achievement 
of the pupils, the greater problem remains of determining how' much of the 
growth IS due to natural maturity and how much to the total educational 
en\’ironmeut in school and out of school, and the still greater problem of 
knowing how much of thes miprovemcnt is due to the influence of any par- 
ticular teacher Most competent obser\'crs today would agree with Trax- 
ler“ that “the use of teat results for rating teachers is seldom advisable ” 
A summarj’ by Barr** notes encouraging progress but emphasizes that 
the road ahead is long and diflicult: 

The influence of anj particular teacher is decplj enmeshed in a host of other 
school, pupil, and conimumtj factors \\liile\er> definite progress has been made 
m this area, it is not easj to isolate the effects of particular teachers in particular 
situations There is reason to be optimistic about the use of more precise instru- 
ments of measurement m tlie management of the teaching personnel, but for the 
time bemg, discretion is the best nart of \alor. 

The Cooperative Stiidv. One of the roost ambitious attempts at evalu- 
ation by means of standard tests has been the Cooperative Study of 
Secondarj’ School Standards, involving 198 schools and a total of over 
300,000 tests ” In spite of unusual care to av oid the difficulties summarized 
m Table 39, the Cooperativ e Study concluded that since the results show ed 
that better methods of evaluation were av'ailable for accreditation, the use 
of standard tests should be restricted to diagnostic and guidance purposes 
by the local school The Cooperative Study also attempted to judge the 
product of the school by follovv-up studies of the subsequent careers and 

(Editor), The Measurement of Teaching Efficiency, pages 73-141. 
Isew xork The Macmillan Compan>, 1935 

Traxler,recAni7£ic8o/r7uKiaiice,paRcl86 ^ewyork Harper i Brothers, 

» A. S Barr, “The Use of Measurement in the Management of Teacher Personnel,” 
tduealion, 66 431—135, March l94o 

» Eralualion of Secondary Schools General llepori, op at , Chapter VIH 




379 



3S0 


MEiSlinEVr\T n .E^TEICTIOV 


•vi-u-ss both academic and nonacademic, of former jiupils, and concluded 
that this method nas mamh of calue for local school use A pcnodic can 
\ass of the opinion of pupils about the instruction and other aspects of the 
school which the} are attending is aho a aaluablc means of self anahsis 
and guidance for, although the customer mac not alwacs be nght, tchal 
he IhnJs about the institution is important c\cn when he is mistaken 
In the Cooperatice Stiidc pupil judgment also proced to be about as 
u eful for evaluating 'chooK as the elaborate testing program 


K General Principles of Lvaliiation 

For clcmenlar> schools The Research Division of the "Vational Fdu 
cation Association formulated the following statements of guiding pnn 
ciples for evaluating the programs of elemental^ schools, most of these 
appear equallv applicable to the other levels of education 


1 Adequate appni al of the «chool includes more than the usual program of 
achiev ement testing 

2 School appraisal «hould be diagnostic, that is it «hould reveal the specific 
points of strength and of vveakne « lo the «chool program 

d Ever, aspect of the «chool program «hould l>c apprawl rcgardle«s of it« 
rclaiiv e difEcultv 

4 Prmcipals and teachers «IjouH plav important parts m the apprai's! of 
their own «chools Their responsibibt> for planning and initiating appraL^al mea.- 
ure» will van. accordmg to the plan of organization nn 1 udmini«tr3tion in the 
«chool svstem as a whole 

5 Vt ithm reasonable linut« and under priper ‘^afeguanU puj il« also ma\ ton 
tribute to school appraisal 

6 Evaluation of the school program should be carried on cnilmuoush Perti- 
nent information should be colleclr^ thniout each vear nnd suiiimanzo«l at least 
once a \ ear 


7 Methods of appraual should be selected on tlie ba as of their reliabilitv 
practicabibtj and anpropnateness in the particular situ-itiou to be nppraued 
A combination of seACral methods is often better than one air ne 

8 Before undertaking an appratsal pnncipaU nml teachers shout 1 fin 1 out 
now comoctent workers eL-«Pwhf>re ha\e e\aliiat<'d simiUr eknients of the school 
program 

9 Careful subjectu e judgments formed in the lii,lit f f -v all 1 entena are better 
n conclusions based on objective data from a poorh planned or nrelesslv 

executed experiment 

1 c?' appraual •<hould be made witl reference to srperifieil entena of «ome 

kind .^ch entena should themselves be carelulK evaluated before the3 are used 
e several t5"pe<» of entena which mav be used tlio^e concerned with 
pupil ^elopment should receive first con'^ideration 

agreement between the accepted objectives of a school 
and the instrument® which the «chool uses to measure its attainment of the-e 
objectives 

13 When it is impracticable to determme the ments of local school practices 
directh these prartices should be appraued with reference to the findings of 
available r esearch studies and expert opimon outside the school 


The \alt<mal E ementary Pnnapa! op at 16 237 238 Jul> 1937 



3S1 


rA\{LUAno\ 01 i^cnoois 

jipsssssss 

Icm " ™“ “ P"””'*'*. ‘>'= '“'IJ informed of L“ 

'* 6''™ nrcurate information concerning 
■mproiTit “ “’'J “V 'll'? 


For icconilarj scliools. The Cooperative Stud} of Secondary School 
SUnchrdb has prepared tlic fonouing e/ghtcen principles, which provide a 
comprehensive phi/osophj not only for evaluating secondary schools, but 
also for evaluating other levels of education 


1 Amcncan secnnihrv schools much as thej mav differ m details are essen- 
tialK alike in their underlving puipo^es and oiginiaation 

2 In a deniocracv the fundamental doctnno of individual differences is as 
valid for scliools as for indiuduals Schools, as vvell as pupils differ from each 
other marked!) 

3 A school can be studied satHfactorily and judged fairlj onlj in terms of its 
own philo«oplij of education it-v mdividuallj cxpre«^ed purposes and objectives 
tlic nature of the nupils with whom it has to deal, the needs of the commumtj 
which it fcorvc' and the nature of the American democracy of which it is a part 
All Amcncin schools however they may differ in type have this m common 
they arc mstmincntalitics for transmitting our American heritage and our Amen 
can democratic ideals I’rov ided this aim can be clearly kept m v lew m every case 
each fichool is free to determine its own educational policies in promoting the ideals 
of American imlization 

4 A school should be judged in terms of the extent to which it meets satis- 
factonlv tlie needs of all pupiK who should come to it not alone of those who 
continue their formal education in institutions of higher Uaining 

5 Method^ of accreditation and interpretation of evaluation should recognize 
the diiTcrcncc« in background development and exi-^ting conditions in different 
st itc'f STjd rogMn^ An attempt *hpuJfJ J>e made to develop uniform •'tandards for 
the nation or to have them administered from a single national office 

C It w more significant to measure nbat the school does tlian what it is or 
what It has The cdu<-ational processanti product arc more important to evaluate 
than the machinery and equipment 

7 A school should be judged as a nlioJe not niereJy as the sum of separ itc 


The number of factors cva!iiat«l m the nio»lern secondary s-hool should be 
sufficiently large and varied to give valid ewdenct of the worth of the school in each 

of its mam areas , , * .ts 

9 Accrediting triteiia and prorcdiirrs slioiiM Ik- hnof enough in extent suffi 
cientiv varied in f.irm and ranvcnienl niuufeli m aiuageiiant to be praLlicable 

irMcttrrt'eraTuatnt aceradiW. .n as Kr as paaaibb sheidd be based 
npm scientihe studies and objeetivc evideiiu,, raUicr Iba., l„a«i untested assump- 
tuns and unsupported opinions 


a. EM, CO cj Scemdarv Schccls bmeral Kcpcrl cp cl pases 57 ol .be h 
Evalmie a <i,cmdary Set eel (IbtO Tdilion) ap »< 



MEASURniE\T m INSTRUCTIOX 


11 The cou=idcreil judgment of competent eJumtom is nn cssenti il fnetor in the 

e^nI^lltlon of the quahtj and character of the nork of n school 

12 A \alid method of eialuition and accreditation, liascd tciilatnelj upon 

existing research studies and expert judgment, should bo fullj tested lij “ 

experimental trj-out in a large group of tspieal, represen atiie secondarj scl 
throughout the countrj The results of this cxpcnmeutntion should be careliillj 

'in'ilj zed and e\ aluated . „ , , , • ? 

13 it IS desirable in man\ rc'spccts that definite standards or Ic\eN of 

achie\einent should l)e dc\elopc<l, it is rccognizecl that in most of the important 
a'^pects of a schools i\ork the best a\ailab!e basis for the development of useful 
standards will probablj be compairson with the practices m other comparahle 
schools , 

14 A good school IS a grow ing «chooI It should be judged bj its progress between 

two different dates as w ell as bj its status at a single date . , t n 1 1 

15 Anj useful, stimulating and valid method of accreditation should be tlexiblo 

with the passage of tune, that is, it should be capable of rea'^onablc modification 
as new ba®es of evaluation and different levels of achievement arc suggested or 
developed from the use of existing ones 

16 If criteria for evaluation are '^uflicientlj flexible extensive, and thorough, it 
is not essential that thej be applied annuallj 

17 The bases and methods of evaluation should be such as to require active 
participation m the process on the part of the entire professional and non-profes- 
sional staffs of the school 

18 An important function of a national, regional, or state agency should 1)0 
stimulation toward continuous growth and imnrovement, not mcrel> inspection 
and admission to membership 


For higher institutions. The Committee on Revision of Standards 
created by the Commission on Higher Institutions of the North Central 
Association of Colleges and Secondary Schools spent five jears and 
$135 000 in making a study reported m a senes of seven monograplis *' 
Section entitled “Institutional Purposes and Clientele,” and regarded 
as “the very heart of the new accrediting policy,” is as follows 


Recogmtion will be given to the fact that the purpo'^es of higher education are 
varied and that a particular institution maj devote itself to a limited group of 
objectives and ignore others, except that no institution will be accredited that 
does not offer minimal facilities for general education or require the completion of 
general education for admission 

Every mstitution that applies for accreditment will offer a defimtion of its 
Durposes that will mclude the following items 

1 A statement of its objectives if anj, m general education 

2 A statement of the occupational objectives, if an}, for which it offers traimng 

3 A statement of its objectiv es m mdividual dev elopment of students, mcluding 
health and phjsical competence 

This statement of purposes must be accompanied by a statement of the institu 
tions chentele showing the geographical area, the governmental unit, or the re- 
hgious groups from which it draws students and from which ffnancial support is 
derived 


*' The Eialuation oj Higher iTuhtuliona, published by Umversity of Chicago Pre'’« 
« George F Zook and M E Haggerty Principles of Accrediltng Higher InslituUons, 

pages 150-lol Chicago University of Chicago Press 1934 



3S3 


EVAIMATION OF SCHOOLS 


Tlip fanlitio'! and nrti\ itiw of 
fM^'Cs it £oek« to ‘^cnc. 


Ml in<«titution i\ill be ju{!ge<l in terms of the pur- 


C. n\aliiiiling Various Aspects of the School 

The philosopli) of the school. There is ratlier remarkable agreement 
among file foregoing prineiplcs for eraliiating the three levels of education, 
hut nowhere is tlic agreement more notable than upon the point that an 
ills! itulion must he appraisetl in terms of its o\\ n philosophy and ohjeetu es 
The Cooporatiic Study, for example, recommends that “a seeondarj’ 
seliool he studied expressly m terms of Us own philosophy of education, 
its individually stated purposes and olijoctlxes, the nature of the pupils 
with whom it has to d"al, the iieisls of tlie community which it senes, and 


If. Philoiopfiy o( Secondary EducaDon 
A- SicnncANT Poem or Viz« 


Tk» itliawt n infwi ta »mrr ite nrwpoMl of Kkeol fceevnmg rt/uot tiptcU ei eitcMtioui 

^UlMotitir Tim b Ba itn^iciuos Ihiiur eMiBtvtf ill Ike n(kt''eo< PrtlenUy ehly Me lUm tbauld be cIiKknl u 
oeb frnu;^lbe esc *101 Bbtcb yDsc Khool I* (s elotnt «/reriMa( a* • metltr el luedteieatal belief nftt6)tis ol Metatl 
prvliee M riu uy Dodtbnuoii ae qviltbciltM le tbe (put pravidtd il yva («) It eetCBury 


readtAtaul Ctnetpa 

i TIm t/T* «f r«Uue»l trruUtCwB Ml 4e«tr>blt ta* 
tootty b «e« U •bieb— 

( ) & TltdtitraiattitselpeliiaMbnimtedttqw' 

tiiUy tnlatb ptntsMl tbotea by (vstnl tut 

ots 

( ] b feUanindettmiwdbylndiTfdsdiwftdtdbr 
(• tltrtotilt «bith h mintud «a (bt bia* 
of T»d»l er (biK* 

< ) C. All iodieldetb (bin la tb« deletruatMe «l 
ptitoe* la pfapatvaa •» tieieibtboe* 

( ) 4 All iaditidob bitt t<ritl rtKt In tb« dtln 
eintimofpobttet 

( 1 «. bdindotb If* tttwpletely nborditttrf to M 
Ibonty, *nd policta ut dtUjouBtd by • 
•only aiMo p 

QwIt^eoOoU 


J Tie (ocuf ertiaitaoea aiotl deunbb te ose ta ■brebx 
1)0. There ere poupt vbeb bivt ^eoal tonal pnr 
Se;» beeewe of brmbtiey laialy eoaaoor 

( ) b Sonil ponttea drpndt omb profnnonal rtli. 
rout rtrul or aaDot^ty 

( ) C. All iridividuilt btve tqsal (Ocial lutal ttyard. 
leu ol ecBBoisic cvltvril or uteUeetual qatb 
iheeuena asd retirdlen olrietor BitioBalily 
{ AViiidividuilioftbedeateifllranilarBiQaD 
ably ftaop have eqva) loc al poiiuoB reptrd- 
leao/efoxaait evlioral eriaieUetniaJ quoV 
i&taliont 

f )t SeoalpokilioiiiirmtoaByiadividBalobebu 

i(b>rv«d eptnal duuaeuaa la bia teM 

QaalvUah^. 


boiefitid by Iheir pbilanihtopie» 

No rettnciiooi art plactd o«8 lit ii»bt of a. 


4. 1» B dtBociBcy lie aebool lioiild ptau iii«l eaipbaM 
Mnbtlpinj 10 prtpart pupils— 

( ) a,. To malt adfusltneiiti to prettot tocru tod ero- 

# )b TopafWrpeieia tberetonstnicUDn of soaely 

< J e. To mate ad|H«ioe»u to meet ebinpn* «.□«■ 

^•oltjteaAMl 


« ) b 

< } c. ladiv 

by 


daalt may cbutii wtalib but b>< .c..... 
tqiiirtmtao of <oa»te*'el'®i’ “ aanaJ i 
(t of bboe a 


(ltd 


I } J All tiare ttpuDy a Ibt prods 
I )» TnyatTmlerpristwcacoorajidbolwilbittlnc 

u?tti tad "sS th foT be 

SVeonsiderable Mtuon of tb« retalual pro- 

dticitoB to tie Iniereits of lb* *o»be»» and «» 
IbepeaeralpubfK 


$ toademoentyfr 
tided foi— 

( ) a AB adolM 

cziiy di 
cannot he eo 

f ) b Only liost adi 
abibiy 

( ) t TbMt add 
preparaf) 

( ) d Only tbose 

ooratc alt 

{ )« 


t stcoadiiy tduca 




ifit by a coUryt 


^•a f t fe o f Moe 

Sf/indard'i. W^ibington, p-ige 8 ) 



ME^SlJHnlC^T rx JXSTRUCTIOV 


the njture of the Amenran di mneran of whieh it is n part It rerngniios 
four distinct phasis in tin ■.itisfactorj c\ dilution of a socolidarj school 


t Statracnl hj the s, I, ml at it* l>l.ilou.|,hv of ^ccondarj educ-ition and of its 

"''i'^Uwling and \ diiliti in if the rUlements of phllosopln and otijectncs 
against tlic neciK of tfic impil tiopnlatli n and uinininnitv ainch ttic scfioof senes 

3 Hoixsion nr naxlifii alion of the st-itimcnts of pfiilosoilij nnd objcctnes 
nci*(s«.in, in light of Step number 2 al)o\e 

4 I %aliution tf all a 5 i)e. Is of the mIuk> 1 in tenm, ( f thr^c re\n-«l -tatement-^ 
of philo«opln aud ohjettues lliis ph tio iu\oKu. Hit. u^e of the rest of tlm tcalu 
alue Cntemi 


Figure 10 illuslratc<» one of the procedure., suggested for formul iting tlie 
Echool’s philo^uphj btrp 2 Am\c lll<llLate^ bru Il> the protediire to be fol 
lo^\ed in e\ ablating this philoso|ih\ vbicli is largely j matter of clucking 
it fur eleaniC5<s, for inlniial coMM^t«tuJ, and for upiiropn itcness to the 
cominunitj to he ‘•crvcfl Ueganlmg the pupils mil the lomnuuutj, the 
basic data ^\hlch arc rcnmrcil for the external e\aluation arc as follow's 


I Basic data regarding pupiU 

A Graduates and enruUment b\ grades an 1 In sex 
B Number of \ears seniors hate btxjii m the school 
C Distribution of withdrawals according to cau«e 
D Age-grade distribution of pupils 
E Dtetnbutiou of I Q 's b} grades 
F Educational intentions of seniors bj sox 
G Occupational mtentions of '<niors b> ©ex 

II Basic data regarding the communitj 

A Popubtion data for the crhool comnimuly 
B Occupational ^tatud of adults 

C Occupational status of south of secondaPi '>chool age 
D Educational r<tatus of adults 
E Financial re^ource^ of the ^cliocil dutnet 
I Agencies affecting education 

G Additional socio-economic information («e\en iten^) 

The educational program A satisfactory statement of the school s 
philr^ophy haxmg been formulated a basis is now axailable for exaluatmg 
the educational program and orgamzatiou of the sthool The general point 
of \ie\v and procedure of the CooperaUxe Study is gixen in the “Instruc- 
tions ’ reproduced in Figure 47 

The following h'^t of eilucational “temperatures” indicates the compre- 
benaix e ch aracter of the e\ aluation of the educational program 

* How to Eral tale a Secondary School (1940 Fdition) op al page 6o 
Eralualive Cnlena (1940 Edition) page 6 Washington D C Cooperative Study 
of Secondarj S hool Standards 

**Ibid pages 17 28 



E^'MVATIOS GF schools ,53 

I Curnculum and Course of Study 

General princijilos; curriculum dmelopment, amount 0/ r, 1 . 

«Iu™t.on; health and phjMcal education for vocational shop; general eialut 
2. Pupil Procrarjj 

'"i “f W'unimcnt, l.ome rooms, school assembly 

n]Tr,-r,t rT"' "T" and speech, social hfe, phji’ 

ra! .net i Hies of hoj ,, p|i_, s,e,,l artmtiss of girls, school clubs- finances gmeral 


(nitructioni 

eentJiAt 


tfotliACd ta 

"U* pti3«fphy .nd ebi«oJS 


pupa p^Utioe «od roruj>»»ji> tt wtU ttOolit prtctiea 

•iivtee — Int Juap tl tli» tthool tke fia»ea»| 

Iv at tSne Uctoet mtr 


«* lk« icliMl 4Ad m«i (be ftnda - 

•r •tbet wtisoU^* TWy tbaald ■ 

upfwrt »v»J»b!e. lUie /equi/rronti, et edxr tou) tietpt 

k«v( a lr(iiim(i( «Brtt «a lie plilouplf Md «t|Ktivcs ol lie Khowi v. »a nuuDa oi loc coB> 
**»U/ J» Ijin uurpnuitBa ol ttetmjaalmlutooBttaiUile iHowtaa my Iw nude far mt 
« ll«e U<l^ bui 41 lie tioe ef enlsitun 4q ttiempt iltiDld be aide is tvijuite lie tfliaj 
protfitaal ti* tebaai rettfdim at heattarf baum aat- 

Tbe lv»(sU ailate of lie voek— «viIiuu<M sad maititioo i» i«prs»eneoi— Could itio b« 
»fp«eo"fU«0p •« Biad Cittfuf <f*»wua«iB* lodjtoeoi B eneoiul »f Cei* purposes »ft to bo 
uutlieioeily temd. V b3e Ce itUiaowol of * lufb tcort aay be desutble it b si wcsndiry m. 
psettoet. naovtd tolbepera tied (•laierfaevaiboccMMUreifuboB.oCerwue, real isiprover 
ateol eionot be oAderulea osd ilUaoi, 

Tbooe sutiof eeitoiiioai thoiild be coastutlp oo pud tpinit lie (otaBSD indesey to 


caccKasrs 

Tie ebeelEiu <o««i( of peovbMOs. coedScioM at eiofoetensKe fo«sd b /wd Mtosdary 
•eboolk Not oU of tfteio ore aeeeuao’.e'eva desetble.lo retry losdMboeL Nstds Ceaeliiuesik 
ioio 41) thor u dcanble u 4 food ociool A oCoet My tbcndsrt Iwl MMcd Ibo sttmilulodKd 
ioef other eompeeiMeutf feotoeoi. 

Tlo*ie«rtleeMcIl)>(e>TooirB lovtyMbsbi (l)lf tbeprovulesoepronwceieaDed/cvo* 
(tree iteoi of ike ebecltul t/e debsiuly or rf (be toodiUeiit uidieiM 4ee preieat to 4 nty 
04(uf«c(ory defrer oitrt <le «ea. to lie porrstleseprendnsfU, mil Ce tymbil (+) (3) d the 
proriioo booty (uriy veil oiode or Ce coeidiuaa* 4iy aaly tivly Bel. iDSib tic itea will tbo 

eymb«f(-}.(Uif lie prm»>o<t« or euorfiuoaeoeroeoJerfbor ore oar nude, wore Wfy poorly oedA 

OroteootprevBltoOByiifiufeutdesree, BorbtbeUeBvitbtletyiBb^tO) (4) if itii oiuieceuory 
or liBwae foe iheKleUU b4«« or to uppfy rbot epeofe iteois coll for Borb ncliteDi mil tl« 
4)Bbol(N) (Neic Tbe iproore lobe icfudodBeidyiocoareoiest lyoiloli, sol Botlemotical 
•erno.) I> bnti. Barb itesa 

■f eaedilioB or prorssoo ■ proeol o' oiode to o ray aodafietsry deyree 
coodilioo or proTUioo b prejeol to toae vtlan or eoly fairly otfl ends 

0 eondllroA or ptevBioo ■ ool prearot or ■ oo< salnfaetaiy 

N condilioa or provisioa does oot apply 
Span IS provided 41 (be ted el each cheClial (or vntiaf b t dd i no oal iteM 
trAlVATlO/tS 

EvslDtUOM ars la be Bade, obercrer nOed lor 00 (l< Paid ef persooaf obiemaon aed ;ad^ 
Bent. In tbe lirbi of (be clecklist as sarbed to occordaocr «ul (be abore matnicuoDa. and of aS 
CKber anibble erideon osisf a (re pou(>atu(aeaIe,aatalIa»s (Koto Tie ffsrts are (o be to> 
forded oiercly u eoevcoirol Irnbob. oot owlbensatKal eoiauues.) 

S_V(ry tmttntf, Ce pronsiosi or coodiuooe aer present and hiacWoinj to tie ertent fooad 

is approsisialely (be best 10% of (eymsOy-ocnediied Kbosla* 

. . .s. — — -Of eoodiboossre pecseel andisBctasoisf u 




« provisioos or coodiCioos are prnm ano isncix 
itely lie seal 20% of tcfiOBnOy-accndilsd scioots.' 
nperere lie peonsions or eondioods nee peiseot and fon-'"' 
approoMtefy tie Buddie ^ of W!r»n ^y-« ^'^,g 

pronmacely (i« oeef 30 
*-VBy uijenet, lie pronsifl 
approeusately tl* t ' 


or csoditiottsaiopreseoi ud fo_ .. 

% of yeyioriaDy otttediled sdiooli-* . _ , 

iiomof eUidrtioosarepiweotaiiidfuiictSosinf to tie extent found in 
5{fO%«#rejioii*Dy-o*er«ditedieBoob.' 

„ — irvvs w epflj (Wbeo tlis symbol is osed eiplaiiabcs bt to Ce ressoo Ce section don 

(Soos justifications of evaliiatioiis, or oiler pertinent maliB^ 

Figure 47. Instruct.ona for Usmg the EvalusUve Criteria °r 1^0 Fd^tio^ 
opcrflive Study of Secondary School Standards ’ 

Cooiicrativp Study of Secondary School Standards, it asfiington, page } 



3S6 


MEiiibliEMBNT IN INSTHUCTION 


3 I ibrar) Ser\ ice 

Iibnrj staff, organization and administration, book collection, ruinber of 
titles, book collection, rorcnc\ , l«K»k tolki ta n, gciicril mlcijii kj perioiln ils, 
supplenientarj nialcnals, fp1c( tion of inatenj!'., tpacherb and tbt hbmi , U'C 
by pupils, general evalu ition 

4 Guidance Service 

N iture and oi^anization, gutd ince at iff, inform ition about pupils, guidance 
procedures, phases of guid mte, results, goner il evalu ition 

5 Instruction 

Classroom actuitica, use of eominuniti, textbooks, metbods of appraisal, 
special committee judgment, general ci iluation 

SUMMARY 

OF EVALUATIVE CRITERIA 




"t Ival.»t,™ for tho ■Ite.I.u, S.^ndar 


School 
of Sec- 


evaluation of schools 


387 


6 Outcomes 

r.\aIiKtion promlilrc!; attaiiiTncnt lo the pnncipal subject matter fields 
attitudes and apprccutioas uiauer oeras, 

rjjturc -18 illustrates the graphical summary of these "temperatures” for 
the mwii m school It suit be noted that this school, iihich happens to be 
a liirgo public school, is rated ‘‘aterage” (between the 30th and TOth per- 
centiles) in four of the si\ areas of the educational program The school is 
only at the llth pcn-cntilc, or “infenor,” m the curnciiluro, however, and 
at the SOtli percentile, or “superior,” m library The graphical demce em- 
ployed makes it possible to see at a glance the strong and weak points of 
the program 


7a tnJviUeg tka p((c na«<j«r Mslcacof nbinrt i9t lattnetioatl p w ea J tr w arvetkaJetocr 

Coeltii ikatilJ b«a<nr provide Aai only (of lalorettUMu] •rluml matter u4 iorfiiUi bataJtofortudenieadiaflbe 
it|Bi£reftce of ibe eooteot eod loreiittode^ eppreeietjooft. aad itfeele. 

A rop* e( lh« Khooi I conrvo cl Had]’ eimht be tupplied ifanilatilt ifoot a brief dempuoa or eailisefetcaeb 

caanetbooldbefiirmiiitd. If Lbe tretboobMrvnaxbecoaneolend]’ itebould be cvelueicd below 

feefude IB tbe ubl« eofy tbet<<i.brrcUorc<nireniB wluck « clei* i> Uefbt rmj yttret la $]unttufaft. 

If ISm IK labiwlt Of Mid* wbicli uanci Uclewidedia lbe uUe below, wnu tbea la the bluk beailiBpoe arcrwnU 
the bred rfi IB out »f the c^tssv 

Vox ibii ib« lymbol // '‘<ood uerior pteduoa dan aat apidy'* ihaald be eudiB the cbetUisl tieni udenloa* 
el ih I Ubie wbeaever the idb «ct 6«Id thould eol be tipecied to caBtnbiiu la the ledicated tKS, w wbes (be »uV 
in CeI4 u eel ead abculd aai be efrred to the ecbool. 



— nsr“ — “ 

ill 



— 



FiBiire 49. An Evain itive Pin^ar^OT tt^^alent oUhe 






388 


MEASDIICMLNT IN INSTRUCTION 


Figure 49 shows the checklist and evaluations proposed for the content 
of the ofTenngs in the pnncipal subject-matter fields Of thc«e, onl> the 
two with the double stars at the bottom of the columns are included in the 
short Gamma Scale of 25 thermometers These two arc also included in 
the Beta Scale, together with the three additional fields indicated bj single 
stars The others appear onlj m the complete Alpha Scale of 110 ther- 
mometers 

Bruner"* proposed an elaborate set of criteria for judging courses of 
studj A gross scale of four points, E\cellent, Good, Fair, and Poor, iS 
provided The following ten questions, for example, are suggested for judg- 
ing the extent to which the course of stud^ is ba‘*ed upon psjthologicil 
principles of learning 


1 Is each new learning act considered to be m some degree remaking the 

whole organism’ 

2 Is self actuitv considered fundamental to learning? 

— 3 Is studv conceived of as an attack upon the “situation, “and what is 
learned is learned as and because it is needed for the control of this 
situation ’ 

— 4 Are provisions made for taking into consid^^ration the underlv mg principles 

of integration? 

_ 5 Are the activities and matenaU organized into patterns which, if used 
assist in the better growing of the individual? 

- 6 Is the position held that the learner should experience sati4action from 
engaging m activities'* 

— 7 Is knowledge considered as a means to enable the mdiv idual to participate 

more effectu elv in life “Situations’ 

8 Is significance attached to pupil meanings and insights’ 

9 Is the view held that growth and learning are continuous throughout the 
life of the individual’ 

10 Is proM'qon made for makmg the situations of the school real and 


The Cooperative Study suggests a vanetj of procedures tor evaluating 
tlie librarj service of the secondarj school On the assumption that the 
ibrary service should be a center of the educational life of the school and 
not merelj a collection of books, it is asserted that adequate provnsions for 
the school library should include the following 

ne«llforSer™re‘“' 'l*’™- (2) books ami peiioJicaU to suppb the 

for keeome all mniP and cultural and ui-^pirational reading, (3) provnsion 

vndes adenuateh foTlk'” '‘'.'“‘'"Sed and well organized, (4) a budget which pro- 
~ Adeq uately for the mamtenance and improvement of the hbrarj , (5) en 

f UStld 3?To? 12?"Nremte 

Ibid page 111 

■>£lel„al,.eCr,lma (1940 Edition) op oil, page 51 



evaluation or schools 


™ngcmcnt of the pupils m the dciclopment of the hahil of reading and eniojmg 
IxMlis nml pcnodicals of good qualitj and real value ^ ^ ^ 


riRilre SO illustrates the dcmation of three measures of the adequacy 
of the book collection It will be noted that books of the various classifica 
tions arc weighted uncqinllj m obtaining the composites The two ex- 
tremes are books on pliilo-ophj, nitli a iioight of I, and hooks on histoQ , 
tra\cl, and biographj, unh a ^^elght of 20 
Tigurc 51 ■'how -s the section on Teachers and I ibranes and illustrates a 
dilTcrcnt to( Iuu<iuc Tins ceetion geeks answers to tw o important questions 
I how c\tcn‘^i\oh do (he (eacJiers make personal use of the re<^ources 
of the hbrar.v in promoting their own professional growth and m their 


Ili Arfcquacf af Lthnty Mitetah 
A. Book CotuenoH 





390 


MEASUREMENT IN INSTRUCTION 


classroom planning and teaching? Second, how effectively do the teachers 
stunulate pupils to use the library materials? 

The Cooperative Study recognized five areas of guidance responsibility 


V, Teach en and Llbranei 

A, PUSOHiU.USX 


Cfmwunit 


( ] t Teaclien itlmalilc pupili ta library. 

ladrvidaiUy eris graupi. la bad and or(aftitc 
matcnali oa aeleeltd nbjeitt or cUnprojMts 
( ) 2. Tcachen belp mpOs la (be <e«cure um of th« 
library. lars«ly by lenai of library rtferreett 
ewded la tboir (laiirooBi project! 

( )] Tcachencncmrace pvpil! (0 ate ibe library tor 
ttemUoaal tad laiure reidiof 

40 EvAtPanON 


< ) J Yeaclert beep tbe librarian tBlormed re;ardiD( 

piotpeetiee clattrooei decaand. on Ibe library 
aed librarian 

( ) d Teacbera ate ibe Ibrary eitenmely lo lieu 
tlaaireoB ptaDaias and teacbiac 

< )f 


( ) 4 Teacben, «ttb tbe help nf tbe librarian, BM ibe 
I bnry at a meant o( ciilllvalisg po^ itudy 
and learaing habit! in pupils 
( ) S Teacbera and clastes borrs* boobs and etfccr 
library msienals for use is tie classroom 
( ) 6. Each leacber beeps a record of tie voluntary 
rradiny dene by tic pupiji lo iia on fiald 


( )a Hew s/teWy ds (sceJUrt ili>tl4/< ya/Si lews l•tr«ry«lal«lo;l) 
CcBcuaii 


( ) 1. Teieiert use teiool and pubbe libraria eiU» 
tjvciy to prosMte tieir on pcnoual tad pro. 
fessosal crowli 

C ) 2. Teachers and aupemton use tie library aa a 
lUoului to eurnculuB devtlopmcal and cn- 
ncimenl 
*0 Eeatoanows 

! ) y firm tilmiMlj it tsacisn «it litrarwi to tUurttm pfaaniatf 
) s. Hew esieatndy it losrbrrs oie I4r«nas jtr tbrw truara readiaff 


B SnucuTio^ or Punt 


CKCBlUt 

( ) 1 Selected pupHs n 




bbtary worb (Tbe Uroe tad .. 
p«pil< ere sever caploiied) 

( 12 . PuMis udividuaJly and in (roups coaoooly 
ODd the library a pnfiUbTe ceater (or etaia. 


VI. Ui< of Libraries by PupHi 

aa aiaitUeit la Ibe b'brary ( > S EupiIaelivityerjatiulJent utelbe library ez> 
■- teatively to the proaoiioB of tbeir prejeeia 

( ) d Pupils are laarmat lo respect pubbe property 
and lo fccis cart for It 


< ) 


. _j library a prafiubTi 
'O prcearatioo 

s use bbrsnes titmtiveiy lor ir 
. (^f^^clopineotherleisoi 

( Pupils help collect usdui vertical 6 
fertiebbrary 


SovrtsHzirrAiY Data 

t Average aambet oIkIooI bbraiy boobs dfeulaled topupils pet ia<»ib .... .... 

2 Average number ol differeut pupSs to irboin school LTwary boobs tircuUle per moati 

3 Number ol high sclool pupils bolding pablic I biaryeards ... . . 

EvALOAnOXS 

* Birttitmndji*tntastt€l< 3 nuyUt 1 ut 
y BcietzUwttljitfafilsMitftruJufhr 
t UncxtautdyitfwftlsfUtnfUmtnlarjmMUftUr 


« y • a-upiii are learning lo respect puoue preperiy 
and lo help care for It 

( ) 2 Pupils are lesreing lo respect tic rgits nf 
otbrn 10 lie bbtary and la lit use of ils 
raaicnals 

( T 8 Pupils are learsing to Bieolbet librann 10 tic 

< ) 9 Pupils use tbe dormitory readin^rooo ilaviilf 

I ) 10 * * 


iii 


Techniques for the Library Service of a Secondary School. 

Study of Secondary School 


III the secondary school These are regarded not as distinct types of guid- 
ance but rather as phases of an interrelated unitary process These phases, 
together wnth the number of items in the checklists and the evaluations 
sought, are, m summary, as follows 

A Educational Guidance 2S items 

1 Articulation \nth loiver schools 

Hon effectiie are procedures for articuUtiou mth loner schools? 



Svinuty Foiu 


V. CuMancc Scnicc 


St WMAB\ 

CoN\Tksios Tabu; 



391 


utiii 



392 


MEASUREMENT IN INSTRUCTION 


2 Curncular and «c1>ool RUidancc 

Hovi adequatcK is guidance provided in such matters as planning a 
cequencc of studies, remed\ing stud\ difficulties, etc 

3 Guidance concerning the post-^econdar) school 

How adequate are provisions for as«iating pupils in choices in\ohang the 
po^t-^econdarj school? 

B Vocational Guidance and Placement 14 items 

How adequate are prousions for i«ting ounils to iiiaVc ivasc vocational 
choices^ How adequate arc provisions lor placement oud follon-up service*’ 
C Guidance m L«e of leisure Time 6 items 

How adequately are pupiU a««i'tcd in making m«c choices of leisure actiuties*’ 

D Social and Cmc Guidance ^ items 

How adequately are pupils a^'isted m making wise choices m matters mvolv- 
mg social and cmc relationships’ 

E Per'onal Guidance ” items 

How adequately are pupils assisted m making wi«e choice^ m personal 
matters? 

Figure 52 illustrates the computation of the summary' score for the guid- 
ance service of the school when the Alpha Scale is u«etl It will be seen that 
the vanous evaluations are entered m spaces provided and then averaged 
These point scores are ne\t expressed in percentiles by the u«e of the stand- 
ard conversion table These percentiles are then weighted to obtain asum- 
tnaryr score The equivalent percentile is found from the summary’ conver- 
sion table at the right of the figure The arrows indicate the sequence of 
events in the use of the tables Similar conversion tables are used for the 
other phases of evaluation 

The quality of instruction in the school ls judged by having the work 
of each member of the teaching staff considered from the followmg points 
of View “ 

A Classroom Activities 

1 The teacher’s plans and activities 

2 Cooperation between pupiL> and teachers 
B U®e of Communitv and Envnronment 

C Textbooks and Other Instructional Alatenals 

1 Textbooks 

2 Other instructional materials 
D Methods of Appraisal 

E Special Committee Judgment 

Figure 53 reproduces the last pages of this evaluation and illustrates the 
procedure 

The philosophy underlying the evaluation of the outcomes of the educa- 
tional pro gram is clearly stated in the following guiding pnnciples ” 

” Eraljaltre Cnlena (1940 Edition), op a* naee IGO 
** Ibid page 8.J 



393 


rVALUAT/ON or SCIIOOIS 

In tlio wlac-itional program of a good gecondar> bchool major « oncern should !«• 
gu cn to attaining desirable outcomes and to the \anous kinds ol evidence indjratm<r 
that such outcomes arc being realized It maj be nece«ar> to test some outcom^ 
b) departments or in class groups This however should not be construed as 
limiting the responsibilities of all phases of the educational program including the 
instructional activities of teachers pupil activity program guidance service library 
scrv ice scliool plant, and school administration for the attainment of desirable 
outcomes Tlierc should lie evidence that teachers and pupils are happil> and 
harmoniou^lj cooperating in the stimulation of a wholesome cunositj about them 
selves and tlieir environment Fvidcnce should be sought to show that pupils are 
securing knowledge and developing worthwhile skills attitudes tastes apprecia 
tion«, and habits There should be evidence th it pupils are able to make desirable 
ciioicca nr to cutcise good judgment in the selection of fnends vocations leisure 
activities, goods and services, and in other important matters which confront 
jouth (odav Evaluation of such activities involves more than detennming tlie 
amount of know ledge possessed measuring the degree of skill and testing the scope 
of understanding important and ncce«sar> as all these are Among others m 
tangible qualities sucli as coopcrativcnc's tolerance open mmdedness reverence 
rcsjKct for law, and self reliance ore highly desirable outcomes Lvaluation of such 
outcomes is b\ no means easv , for most of them there is no standard nica«urc and 


0. UnnowwAm-usAJ 


Ormutr 

t ) I *rv« ««OiM w* 

i^TiAUm, ti« tfr U«J«M « 

ITM «l inu (lid «<«* Okk tenra >(■}- 

’ ' ^a;.r,".fi'rrv./x"Xi:3r.S 

ih(i ihfT S'* ««•«* «« y, •" 

riiiaUftrf (iwt** c» tj- w f» pvpd$l» 

IM h ng (ltd IWK "I P “*» 

•««t ** uadoK 

f > * ^>»mi«*»dBie4«fi'’rr 

OBrdnmrrMr* li«rOi»«Moip(nsoo 

4 } 7 

•It pretrwi ••d « tbcjt trfol^ 

«tpt e? <fc>in4K UiH*. »®‘l 


( ) t. nttncVer«w>lnt>ieii»s(l»U*Bdtvili(lt 
pvpili sBdtfWtndj*; led •hility silt 
•pp etUBBi bI kiiB* td^B 

t } f nfleteitrrutinuiot iseJtlt ui tvglMli 
pup U ipp tt > iont ktliiuda, Bad dul* 

( I jOi evp/»B«e wra tsmiitteJltnfBmprtipi* 

MlV is urni» ol (ducauonal ubb aod el 
t 4 tl» BWB pcTpcwt 

< ) It PtcaBtiets St (•TttBlttpt lafuttCMb- 
iB( prtttden tad >{6Ua*ed by ippropmtr 
ttisediBl setim «» 

( ) 11 0 ber CBeibB4> sf tppnul neb u ob9erT»- 


Jl lxb*«or - 

nls.BBdn Dt ol ptnoBtUty t 
( ) O RwulUBl “ 


roadtlbt Uvil 


art Bled 
rfurlbtt 



TbuevsIoBtioB u to 
•• EVAtPATlOM _ 


E. SncMCnaana Iuochbh 


**•**'' 

r *1.*. n»*htv of Instruction m a Secondarv 

bl, thrls ImKlon I “S' '™> 



394 


MEASUHEMENT IN INSTRUCTION 


therefore eNaluation of them necessjinlj will be hrgelj a matter of judgment The 
difficultj of the task is no reason for avoiding it, and the importance and um- 
ver^liU of the problems involved make it imiierativ-c that attention should be 
directed to the attainment of such outcomes and to their proper evaluation 


Another useful instrument “designed to serve ns a ba«is for the appraisal 
of individual school systems «Uh respect to thnr adaptation to current 
educational needs” has been prcparctl l>v Mort and Cornell** It covers 
much the same scope as the Cooperative Studv , but the technique is dif- 
ferent Specific questions are rai*^, to be answered I'cs or No, with places 
for the supporting data at the left of the page The seorca for each section 
are then entered on a special score sheet, and bj' a simple process of w eight- 
mg are combined into a single score Table 43 gives the summarj of this 
score sheet Tentative norms arc available for ‘=chool s3stcms of vanous 
sizes located in four sitates The first of the ten parts of the «cction, which 
attempts to determine the degree to which the cflin ational program recog- 
nizes the nature and extent of individual difTcrtiiccs in pupils, is as fol- 
lows “ 


a Intelligence Test® Group and individual inklligcnce 
tests 'hould be used as one of the means of aiialj ting 
problems of maLadjuetment 

^ How many of your puptla hate hem pten inUlUgmu 
teaUf 


Interview Principal, Gtudance direc- 
tor, Ps} ehologKt (if anj) 

Ob«enc Individual records, Test 
records 

Evidence 


1 Individinl intelligence 
tests liav c Ixjcn giv on m 

special cases Yes No 

2 A group intelligence test 
IS given to all elcmen- 
tan and first jear high 
®chool pupiU at least 

once in three \cars Yes Jvo 

3 Intelbgence tests ro- 
£uU« for tests giv en 
both m elemeutan and 
m high school, are made 
a part of the permanent 

record of the child Yes No . 

4 Educational and in 
telligence tc&ts results 
feroup and individual) 
constitute one of the ele- 
ments upon which gmd 

ance is ba^-ed Yes No 


« Paul 
59 pages 
1937 


U. Mort and Francis G 
New 'iork Bureau of 


Cornell A Guide for Self Appraisal of School Syilems 
PubucatioQB Teachers College, Columbia Unirersitj 


** Ibid , page 26 



EVALl ATIO'i OF SCHOOLS 


393 


Tlic scliool organization anil plant Numerous checklists and score 
cards hate been prepared for rating school buildings equipment and ad 
miiiistratnc procedures In this field the pioneer nork of George D 
Stria er N I- riigclhardt and their students” is especially notable Tiio 
dcielop’mcnts mil be described bncfiy The scope of the evaluation pro- 
cedures dot ised ba the Cooperatiac Study is apparent from the foUomng 
outlines of evaluative criteria 


A School St-ifT 

I rrTe'"rnai'S’''c!cction qualifications laiproienicnt 

3 \onprotc«ional stalls qualifications laiproieraent la and couditioas 

4 bjjcn il clnractcri=tic3 of the school staff 

5 General evaluation 

“ fTtn'r health and safcti ocoaom, and efiic.enca influence on the cdu 

0 Su,fil™r™al.fi«ndsafct> ecouonia and eSicieaea laflueuce on the 
3 CIS' iSraodsafeta economa and efficiency influence on tin 

education'll progratp «tc 

1 S:nl=r»etfioolplaat 

6 General evaluation 

" “trrrff aumericaladequaci preparation and qualifications 
o S:rn'Xn‘h™I'"' gcueia.pofic.es superintendent of scfioo 

3 !;"p"cntnof.nstiuct.ou oluectives piocedures and activities pi.nc.ples 

4 Sullms. 0.1 of special accountuig 

5 i;S"e™t.cs^ftfie»chr»ladu.u.^^^ 

8 General e\ aluatiou 

hv Mort and Cornell is some- 
That the scope of anab ® P™P->f sections relating to the 

rro.“::ru= 

-T;:::.tfiyParcaaof..fi.o„e.acficisCo..e.^^^^^^ 

Mew \ork 



TABI E 40 


SuMM.m O. THE MoirrCoRNfii Score Siifft for the Su f A. pR-.isvl of 
S cllOOI SySjTJMS 



AdjiiUinenla 

/'ostiWc 

Maximum 

Score 


Section 

Total 

Section 

Total 

I Cla'J^room Instruction 





A The curriculum 


W 



1 Flexibility of curriculum 





2 Breadth of curriculum 





3 Courses of 9tud\ 





B Pupil activit\ 


28 


190 

1 Fields of learning 





2 Extracurricular activities 

“ 




3 Instructional matcrnla 

8 




II Special Ser\nc.e9 for Individual Pupils 





A Pupil record'* and attenilancc 




104 

1 Education il accounting 

7 




2 Census and attend um t 

0 




B Provisions for in lividual difference* 


20 


150 

1 Guidance education il und vot i 





tional 

7 


42 


2 The individual and the education il 





program 

10 




3 Health «crvice 

9 




III Educational Leadership 





A Supervision and school organization 


21 



1 Professionalization of personnel 

S 


40 


2 Supervision of instruction 

8 


40 


3 Grade and subject organization 

6 


25 


B School administration and the com 





munity 


21 


105 

1 Administrative planning 

C 


30 


2 Status of control 

7 


35 


3 Scope of school influence id the com 





munity 

8 


40 


IV Physical Facilities and Business Manage- 





ment 





A The school plant 


30 


90 

\ School plant planning 

5 


15 


2 The school site 

5 


15 


3 School buildings 

10 


30 


4 Special rooms 

10 


30 


B Business management 


14 


42 

1 Supplies and equipment 

7 


21 


2 Fmancial accounting 

7 


21 


Total 


183 


1008 







39 ; 


I ^ -iWAllOV IV SCHOOLS 


B rCOVOAIY A\D ErnCIEiNCY" 

Cm CKLIST 

( ) 1 TI.0 10.5 rend, Ij acco^, Me 10 He school population 
( ) 2 ““'"'■We OTcr hard surfaced roads and adequate walks 

( ) 4 rji} areas are rcadiJ^ nccc^^jbfe 


( ) 7 

( ) S 

I ^ALLATIONS 

( ) X Kow accembU is the stlef 
{ ) y I/o(c eitcMuc is Oe site* 

( ) 2 //ouj ircJi odep/cd ta f/ic «jfe/or /ufure exparision? 

Comments 


THE SCHOOL SITE** 


d Adaplabilitv I nch «chool ^itc «hould bo hid out and rie\ el 
ojjcd in con«idcratio/i of both prceent and estimaled future 
ticeds Yes Ao 

Q ir/jat ta the aicragc use of tfc (Umentary school sites'^ Of high 
school Sliest How do you justify the suet 
intemow Superintendent I Th** superintendent has 

Ob«5cne Plans and dah on future dexcloj^ plans for the 

gronth, areas of present sites and Ia\out of each perma 

enrollment of schools nent site m terms of 

estimated clianges m en 
E^^t^ence roUmerit and educa 

tional program Yes Iso 

2 Present de\ elopment of 
sites 13 such that adjust- 
merits can be made mth 
muiimum of cost to ex 
pandmg enrollment and 
pix^am Yes No 


An index of variation. N L Engelhardt, Jr ,»« has prepared an tndex 
of vonahon, -nhich is based upon the assumpUon that any true equalization 
of educational opportunities must provide for vanabibty, rather than urn 


Evaluative Cntena (1940 Edition) op eU page 116 

A Guide for Self Appraisal of School Systems op et( page 4y 

^^The Fevort of A llney of the PuUie Schools of FilUburgh Pmne^iama p3g^ 
4)7-I« K yo-rt Bureau otPubheafo,. Teaehen CTi.ge Colurubu Uu,ve™„ 
1U40 Also see The Amtncan School and Umxraly YeaiM for IJW 



398 


MEASUREMENT IN INSTRUCTION 


formitj m the school plant and program The needed t anation must con- 
sider the impact upon the pupd of the phjsical and social characteristics 
of the environment, as vrcll as the personal qualities of the people to be 
served For example, the thirtj -eight factors that should be taken into 
account in determining the educational program of a community relate to 
the age distnbution of the population, the health conditions, the housing 
^'onditions, and the social conditions of the communitj’ 

Concluding statement. E\aluation is by no means a nu\ idea m 
Vacation, although the concept has been greatly enlarged m recent j cars 
Many new techniques ha^ c been dc\ i«cd to supplement those alreadj in 
existence, and m some cases to supplant them altogether Much remains 
to be done, howet er In the meantime educators should acquaint them- 
seUes with the uses and limitations of the techniques ^\hich ha\e been 
developed There is no escaping the fact that e\aluation is one of the most 
difficult, as well as one of the most important, problems in the modem 
«chool The best existing e\ idence that a school is good is the fact that it 
is continuallj studjnng to find wa>s to impro\e it'^elf 

Sflected Reieiilscfs roll Fuhtheb TIekdinc 

Benson, Arthur L , Hoic io Lse the Cntena for Etahialing Guidance Programs m 
Secondary Schools, Form B 'Washington, D C U S Ofilce of Education 
March, 1049 

Cornell Francis G, Imd\all, Carl M, and Saupe, Joe L, “4n Exploratorj 
Measurement of Schools and Cla««rooms,'’ Unuereity of Illinois Bulletin, SO 
1-71, ?so 75, June, 1953 

Dailev, John T , “Development and Application of Testa of Educational Achieve- 
ment Outside the Schools,” Reeieic of Educational Research 23 102-109, Febru 
ary, 1953 

Domas, Simeon J , and Tiedeman, Davnd V , ‘ Teacher Competence an Annotated 
Bibhographj,” joamof of Experimental Education, 19 101-218, December, 1950 
Jahoda, Mane, Deut«ch, ilorton, and CooV., Stuart W , Research Methods tn Social 
Pulalwns bewYork DndenPre^s 1951 Chapters 5 and 14, “Data Collection 
Ob«eiA ational Methods ” and “Observational Field-Work Methods ” 

Leonard J Paul, and Eunch, Alvin C , Editors, An Eialuation of Modern Edu- 
cation Ivew^ork D Appleton Centurj Companv, 1942 299 pages 
Pace, C Robert, and Browne, Arthur D , “Trend and Sunev Studies,” Reneic of 
Educational Research, 21 337-349, December, 1951 
Reavis, W C , and Cooper, D H , Evaluation of Teacher Merit in City School 
Systems Chicago Umvepfitj of Chicago Pre^s, 1945 139 pages 
Sella, Saul B j and Elhs, Robert W , “Obser\ ational Procedures XJ^ed m Be«earcli,’ 
Review of Educational Research, 21 432-449, December, 1951 

Arthur E (Editorl, “Measurement and Evaluation m the Improvement 

of Education ’’AmencanCounai on Edueotion Studies 5enes2,^o 46,Vol XV, 

April, 19ol 141 pageo See especially pages 58-67, “Planmng a Comprehensive 

E\aluation Program, bj Paul B Diedench 



EVALUATION OF SCHOOLS 


399 


Tro\ er. Maunce E , and Pace, C Robert, Etaluation in Teacher Education 
Washington D C , Amencan Council on Education, 1944 368 pages 
UTiitnci, Frcdcnck L, The Elements of Research (Third Edition) iSen York 
Prenticc-llall, Inc, 1950 Appendix IV, “Representative Federal Survejs of 
Education ” 



16 


Public Rela lions 


A The Prolilj in 

Ihe following rcmarkb h} Hoherl I Thorndike, though not pertaining 
directlj to bchool situations might well be applied to them ‘ 

In nersonuel selection as in most fields there is no lack of polished mdiNnduals 
who present m a compelling manner some completely unscientific and un\ alidated 
technique It is often true unfortunately that the best salesmanship is 
applied to tl e poorest product The temperament which is disposed to careful and 
exacting n search tends not to take kindh to or ha\ e a gift for promotion But it is 
lUst this sound scientific worker who must id self-defense de\elop effective pro- 
motion for his service The layman does not have the background to discriminate 
between effective personnel research and quackery He must be educated and 
trained to discriminate between the tested results of a sound personnel system and 
the unfounded claims of the quack The more scientific and rigorous a personnel 
research w orkcr is the more importaut it for him carefully to consider the public 
relations side of lua work 


Thirty six years earlier hi« father and I^andel had quoted a niEetecnth 
century educator writing m a similar vein 

Much of the scepticism prevalent as to the power and value of popular educa 
tion anses from the inabihty of the educationist or of the school teacher to afiduce 
satisfactory satistical evidence of the moral or of tl e intellectual results from any 
special courses of mstructi n or trainu g as manifested in after life * 


Pepmn'l TmI and Muisurtment Techmma page 

, 51.1 Now York John Wiley d. Sons 1940 

^ Thornd ke and Isaac L Kandcl in Educational Measurement of hiftv 
u ^ November 1)16 from L Chad 

6 480-484 18*4 ** ^ ^ Q Magazine of hducatton Literature and Scierux 


400 




PUBLIC RELATIONS 

mof “tl" "'r'' '''‘™ “''“"Ct' ”0 V0ic“a“!>ell m cmLa 

0 or the strc^'i schools place upon the spectacular in education to the 

drcn These Btateracnla indicate that the public and the school do not 
aliiajs understand each other, and consequently do not noik together to 
tneir mutual ad\antage 

The mcnning of public relations programs Narro^vlj concened 
the public relations program of the school is synon^mious vnth the pubhciti 
actixitics of the school In recent years honever the terms puhlmly and 
■propaganda have become so cIosel> associated and so discredited in the 
public mind as to arouse suspicion that something sinister is about to be 
put over” Broadly conceived, pubhe relations is merely one important 
n'-pcct of the school’s program of adult education Its primary aims are 
tivo (1) better understanding by the public of the purposes programs 
aicompbshmonts and needs of the school and (2) better understanding 
)>y the school of the desires and needs of the communitj as reRectod in the 
educational vieii« of the public In other iiords its purpose is to effect the 
mavimum co-operation betiiecn the community s tiiomost importantedu 
cational institutions the home and the school And it must be remembered 
al«a 3 s that the child is the connecting link between them 
A prominent educational leader* imludcs among the important purposes 
of measurement and eialuation in the modem school the proMsion of 
“p vchological ficcunty to the school staff to the pupils and to the pa 
rents,” and a “sound basis of public relations Concerning the latter 
Tyler says 

Ivo ractor is «o important m establishing con«lmelive and co-operati\ e relations 
with the communitj as an understanding on the part of the community of the 
effe<.tneaesf of itie ^choot A capful and corop/eheDsive evihiatiou should pronde 
evid nee that can be ivi leb publicized and used to inform the school commumty 
about the value of tie cliool program Many of the criticisms of the school ex 
pressed by the taxpayers and parents can be met and turned mto constructive 
co-operation if concrete evidenc** i« available regarding the accoxnplishrotnts of the 
school 

Ihere are several reasons for thinking that the problem is becoming 
incrcasingh important and difficult as the yeai’= go by The enlarged en 
rollment m the secondary school and the accompanying expansion m the 
school program have brought many changes which the public does not 
understand Tiis fact is mainly responsible for the common charge that 
the modem school ciimculum is cluttered up with all sorts of useless fads 

• Lester S Ivins Wbat Parents Erpect of the School JmTml oj tht Kat cruil Edti 

'“nlflTXl'r Tte%t" f‘Eiln.t«m m Modern Education 
•School Juurna V 19 27 September 1940 



402 MEASUREMENT IN INSTRUCTION 

and fnlls ” The increasing burden of taxation has naturallj made the 
citizens critical of all public expenditures Since m most communities 
the public school sjstcm is the biggest public business it is hk«I> to 
bear the brunt of the attack ^or should one o\erIook the stubborn fact 
that the enormous expansion of such enterprises as are proiidod for by 
the social secuntj and old age pension hgislation has greatl> increased the 
competition for the taxpajer’s dollar In such a situation it is cspcciallj 
well to keep m mind the i\ise statement of President Madison “A popular 
go\emment \nthout popular information or the means of acquiring it is 
but a prologue to a farce or a tragedj, or perhaps both ” 

The principal sources of “popular information” raa> be con\cnientlj 
grouped as follows 

1 Ordinary agencies local newsj) ipcrs student publications 

2 OfRcnl pubhrations reports hullctms handbooks etc 

3 Report cards and letters to parents 

4 Mi'scelLincous pubhc programs exhibits P T A etc 

Each of these will now receive brief discu'^sion 


B Ordinarj Agencies of Public Information 


Loral newspapers As a medium for bringing about desirable relations 
between the school and its public, the newspiper ranks lugh For most 
people It IS the principal source of information, but school news as reported 
m the loc il paper is likelj to be narrow in scope and lack proportion An 
extensise earlj study by hailcy* rcvinlod that a& n rule the patrons re- 
cei\ed lea«t information on the school topics in which they were most m- 
tere-sted and most information on school topics m wluth they were least 
interested Table 41 summarizes the situation * 

It maj not be surpnsing but it is certain!} unfortunate to find that in 
the typical newspaper the total space dex oteil to iJie first six items in order 
of patrons interest was less than half that given to (octracurncular activ 
ities which stand at the bottom of the list Both the school and newspaper 
appear to take for granted the excellent work of the classroom, wluch, 
therefore falls in the dog bitcs-man category rather than m the new's clas- 
sification ’ They both apparently foi^et that a report of the incident is the 
most interesting thing in the world to the owaier of the dog as well as to 
the man who has been bitten Parents nexer tire of hearing good reports 
of their own children There is no good reason whj the educational side 
shows should be allowed to swallow up the mam tent Farley calls attenUon 
to these facts * 


» B^mont Mercer Farley What to TeU Oe People about the Public Schools 136 pages 
Teachers College Columbia University 1929 

* I bxd adapted from pages 1 6 and 49 

j For an instructive ^scu«sion of this point see Edwm J Brown Secondary-School 
Admimslration pag^ 270-2/1 Boston Houghton Mifflm Company 1938 

• Ibid pages lb 17 ^ ^ 



1‘i’BUC BLLATIOVS 


’’1" rt'4°„";50G7 Acc„„„.s„ 


Topics 0/ School \«r? 

I'upil progress nni] achiocnjrnt 
Method of iri'-truction 
Hoiltli of puptN 
Cour»c« of sludj 
^ alue of education 
Di«cipline ftml bchav lor of pupih 
Teachers niid school officers 
AtteiKhnce 

HuilJiiiRS and buil ling program 
nusincss management and finance 
Board of education and admmiatration 
Ba/ent toicher a^^ocistion 
Bxtracurncular activities 


Rank According lo 
I’ltmns Interes-tsl Space in NeTvs 


In other word®, patrons ni«h to know what their children are being taught koi^ 
thej are being taught, icAaf results are being achieaed and how the public schools 
affect the ph\«ical welfare of their children Thej are read> to liswn to the 
educator tell (hem (hat the results achicaed in (he «chools are desirable that thej 
are nchieted b; efficient scientific methods that children are taught useful habits 
and «killa, that their ph}sical welfare is not neglected 


Stiulcnt publications. Student publications should occupy a strategic 
position in any public relations program They represent activities that 
Iia>e educational \alue in themselves and thus constitute important ex- 
hibits of the actual work of the school Of these publications the school 


newspaper and the yearbook or annual are most important Since they are 
written primarily for the pupils and patrons of the school these publica- 
tions can portray Che actual operation of the school program more fully 
than the general newspaper, which must appeal to a wider public What the 
student docs is always of interest to other students and to parents but 
examination of the student publications of most schools would probably 


reveal a very distorted picture of the school situation As 111 the regu- 
lar newspaper, the extracumcular program looms large The reader can 
scarcely escape the conclusion that the ■school year is largely occupied with 
social affairs and athletics Those who criticize public education as an ex- 
ponent of “fads and frills” could hardly do better than introduce tlie year 
book as Exhibit A Beside the stadium, the library dimdles into msig 
nificance, and such things as classrooms and laboratories are deemed ‘=0 
unimportant as to be omitted altogether It is not too much to expect that 
the student publications present a truer picture of the schod giving grea er 
prominence to those features xvhich justify its existence That the publ.r 


404 


M^ASVRm^L^7 n^TRUCTION 

lb genuinely interested in these, there can be no doubt Ccrtamlj Parents 
would put c\udence of pupil progress and achic\ ement at the top of the list 

C. OfTienl Publication*; 

Annual reports The earliest record of a formal written educational 
report rrab made in Boston, iMnssaclmsr tts, in 1738, althoilgli informal 
oral reports had been made to town meetings in New Pngland at an earlier 
period * It is clear that from the outset tbe pnmar> function of such reports 
has been to inform the public regarding the aims, progress, and needs of 
the schools and to afford an inlelhgcnt basis for dctcnnining educational 
policies The first wntten report for example, ga\e the enrollment in each 
school and included comments bj the \isiting committee on the quahtj 
of instruction The function of such reports was well stated in the intro- 
ductorj pages of the 1841-1842 report of Tall River, Massachusetts, as 
follows 

Those who are taxed to support Public Schools hav e a right to know how their 
monej is expended and what is the character of the schools which thej are re- 
quired to maintain The committee arc but the agents cmplojed b> the towai to 
take the agencj of Common School Education and the employer ought to be 
made acquamted with all that appertains to his interest in respect to this agencj 
What the committee knows as to the schools the town ought to know 

Since the appearance of standardized tests, the annual reports often 
describe the tests used and the purposes for which thej are employed, and 
give snmmanes of the results Some cities make effective use of graphs to 
show that progress in the tool subjects is regular from grade to grade, as 
well as profile charts to illustrate the use of standard tests m the diagnosis 
and guulance of individual pupils There is no way better than test results 
to *bow the need for curriculum changes, guidance services, ungraded 
'’lasses, and other provisions for individual differences There can be little 
doubt that parents are interested in receiving not onl> an account of how 
the moncj for public education was spent, but also of what it bought in 
the waj of an efficient educational program 
But most school reports have one fatal weakness Ihej are not read 
The reason for this has been stated as follows “Most official reports are 
dull Their authors though they have the most interesting material in the 
world treat it perfunctonly, statistically as lifeless stuff to be put away 
in mortuarj files The problem mth school reports, as G Stanlev Hall 
hmg ago pointed out in the case of moral education, is how to make virtue 
evcituig 

* VVardG Reeder An Inlroduclion to PuUie SchwA Relations pages 8o 87 NewT^ork 
The Macimllan Company 1937 

“Quoted from M G School ReptnU as a Means oj Securing Aadilional 

hr^ducaiion tn American Cities oages 4-1 Columbia Mo Missouri Book Companv 

Ed tonal m Ttie Neu EorA: Times January 4 1926 
“Foiigood is'-usoion -ee Ward G Reeder op ni pages §0 104 



PUBLIC RELATIONS 


40a 


Special reports and puhhcarions It be recognized that nothing 
IS great or small, good or bad, except by companson Because of this fact 
school surveys, \\hich attempt to interpret the local schools m relation to 
those of other sj stems of similar size, are important At tunes such studies 
made hj impartial outside agencies are espeejalJy effective It is even bet- 
ter, perhaps, to have a continuous self survej, and to report at strategic 
intervals various phases of the school program The hrger cities employ 
for this purpo'50 bulletins or magazines modeled after the house organs of 
industrial organizations Graphical coropinMins of standardized test scores 
with national norms maj he so reported 
A common ( nlicisra of the modem school program is that it has allowed 
the newer “fads and frills” to displace the older fundamental subjects ’ 
People long fur “the good old dajs when people reallj learned something 
when the} went to school ” ITie most effective argument with which to 
meet such cnticism is a companson of the achicv eraent of the older schools 
and the newer, ur of the traditional school program and the more liberal 
program of to<luv Uilcy ina«lc such a atndy in Spnngfield Massachusetts 
of the results of tv sts m I’JOi* that had fin>t been given to children m the 
fitj sixty vtvrs earlier lu IS4G fhe finvlings, briefly summarized below 
in terms of percentage of correct responses were favorable to the later 
sehooN 


SuhjerU 

1 Pcreenlage Correct 

me 

I90S 1900 

Anthiuctie 
faptllii g 

Gcogr iphj 

J94 

406 

40 3 

6o2 

612 

534 


Fish- made a somewhat similar study companng the achievement of 
Boston children in 1928 with that of pupils in the city on the ^me tets 
in 1853 seventy file years earlier Again the results expressed in terms 
of errors made fav ored the later scliools 



ZTT'Z VfA 9 » The Holden Patent Book 

i«J I Uiley The Spring/ eld Tests Spnngfi« 

Compai y 1908 s-«n(H Five Years Ago and Today Yonker« ITorJ 

‘ Louis J Fish Fxamtnaitons 'seventy rioe I » 

Book Compan\ I 



IOC 


MEASUREMENT IN JNSl RUCTION 


These studies suggest pre«emng the results of standardized tests so that 
at internals of perhaps ten or twentj 5 cars the} can be compared ^\ith 
current results on the'^e tests, which will afford coiunncing c\ndcncc of 
trends in efficienc} A stud} of this l}T)e, co\enng achievement m Phila- 
delphia high schools for a ten }ear period, has been made b} Bo}er and 
Gordon,'* and a stud} of arithmetic for a twelve }car penod in St Ixiuis 
has been made b} Boss '* 


D Report Cards and to l*unnls 

Trends m report cards For man} }cars report cards have fumishcil 
the most direct lino of communication lictwcen the home and the school 
The} have ordmaril} consisted of a record of the pupil’s attendance and 
academic achievement, ex-pre in teachers’ marks, sent to the parent at 
interv als of a month or «i\ w ceks In recent } cars, how ev er, certain impor 
tant changes hav e taken place In a comprchcnsiv e surv e} of the literature 
relating to report cards, Messenger and R alts’’ noted the following trends 

1 There is general dis'atisfaction with an} scheme of griding that encourages 
the companion of pupils with each other 

2 If anj grades are u«ed a *cale with fewer points is favored, a tliree-pomt 
«cale being most often recommended 

3 There is a wide-^iread feeling tint Uie «chool* ehould cviluatc traits other 
than mere subject-matter achievement 

4 There is a clear tendencj to u«e descriptive rather than quantitative reports 

o Report cards are being displaced bj notes or letters to parents 

6 Card* note« or letters are being sent at less frequent intervals and in some 
«chools onl} when there is specific occa«ion for such communications 

7 Attempts are being made to give more detailed diagnosis of pupils achieve 
ments 

8 Parents are bemg asked to cooperate m building report forms 

9 Pupils are cooperating both in devising report cards and m evaluating their 
own accomplishment 


A stud} ’* of trends in nine western states indicated that these changes 
were more marked m the elementarj than in the secondar} school This 
stud} notes a wholesome effect on the personalities of the pupils the effect 
being especiall} marked for tho^e of lower ability The most noticeable 
eff^ however appears to be in improved teacher pupil relationships 
There IS also evidence that these newer s} stems of reporting are often 
approv bj the parents After six }ears’ expenence for example one 
ynter ma kes this positite statement ‘ The letter fosters a much more co- 


Schools Neglected Academic 

Achievement’ ScAoo! and Sociejj, 49 810-812 June 24 1939 

March 23 1940^°’’ '^*'™'‘'C Then and ^o» School and Socc/i, 51 391 Slf 

Pe^' 'Sr* Summanes of Selected Articles on School 

■Mfenri tl 21 =39 ooO October 1936 

NotfmTS 1939 ^ '“''on. School, 24 ol 5i. 



PUBLIC RELATIONS 


407 


operative relation between home and school Morrisett*” reports a study 
in which the principal of a large junior high school submitted a list of forty 
items to tlie parents with the instruction to check “items m which you are 
most interested, that is, tho«e items about which you would like to know 
more ” The item “What parents can do to promote pupil accomplishment” 
ranked first Other items high in the list clearly indicated that parents 
dcMrcd more information regarding educational and vocational guidance 
The weaknesses of the older report card was just here The information 
supplied to parents, even if its accuracy could be assumed, was of such a 
general character as to be of httlc help m either diagnosis or guidance, in 
which full co-opcration with the home is most needed 

Evans*' has traced the evolution of the report card He notes a definite 
trend awav from the standardized printed card and toward a more flexible, 
informal report that is better adapted to loeal conditions and needs There 
15 an iiicreasinglj clear recognition that the function of reporting is m(er- 
IZLn rather than presentation, uath the emphasis on prcffr® rathei 

‘''n.U’sluida of report cards. Hill” analysed 443 report cards from 
timn and cities of all sizes, representing all educational levels and practi- 
He concluded that a satisfactory report card should 

accord aitl, changes la educational standards and edncatonal 
,,hilosoph> IS broad enough to cover aU im- 

p„rt,„IXt.on"afrulomes^uhiect.chieieme„t character ou^^ 
adjustment, health, and as well as of outcomes 

5 Give an ^ svropathetic understanding of the child 

~ ftSl Taas of reporting flexible enough to account for th- pe-_xr 

’"■’rtfetfaccounro? p“fl prafl"*® undemtandable and mstm-tair Ir* 
'’“"g BrmgTo ut closer cooperation and greater mutual uadertlxnii.^ bvc.- 

for reciprocal reporting [Thatis, space for BuggeHaaoriar^ 

from the parent ] l^sic abihtie<» and es/itatp* 'r " 4 - 

11 Rateachievemcntmreiautr 

%?rpreta.g lie School to the T a- 

48o April 1933 Trends and Imtia in Rtix-- v T^sr 

uEohertO Evane J Ncit York Bureau of PaVa-adc-jr T>-« ae-* ' 

of the Chid in School 98 page. 

Coliinibia ti,„ Report Card in Preecnt ?Ie.~irm~ jf e'rtae 

■■ George E Hill 'O'- 
115 131 December U-Jf 



408 


MEASUREMUNT IN INSTRUCIION 

12 Rate achie\eincnt b\ means of \alid and reinble marbiiiR systems 

13 Conform to reasonable standards of form and appearance The report t-liould 
be attraetivc 

rhe ordinary report card often fads to inei t the fourth requirement in 
the above list It tends to neglect the less tangible but important outcomes 
of education reflected in vocul and personal qualities One advantage of the 
informal report card or letter to parents is that it attempts to inform 
parents on all phases of pupil growth But it is the spirit of the report 
rather than its form which is important Indeed a curt note from the teacher 
may be w orse than the usual report card Clsbrec*’ cites the follow mg letter 
from a teacher to the parents of a slow learner which is a good illustration 
of “How to IjOsc Fnends and Influence Parr nft — ui the Wrong Direction”: 

Dear Parent-* 

Donald b is tnipruvtd at iiothmR e\L> pt 
‘ipellmg and that vcr> little 

Siiicireh, 


Itacher 


For use m the elementary school, Hill suggests the informal report to 
parents, reproduced with slight modifications in Figure 54 A similar form 
for the second half of the semester calls attention to improvements noted, 
and invites further parental co-operation on other points Neither the re- 
port itself nor the letter accompanying it makes such demands upon the 
teacher’s time as does the personal letter, which should probably be re- 
served for very special occasions It is always a good idea, of course, to 
apply the grease when the squeak appears The letter suggested to accom- 
pany the first report is as follows 


Dear (name of parent or guardian) 

Isow that the semester is one-half over we wi^h to call jour attention to 
’s school progress The enclosed report covers four kinds of 
progress— progress m school subjects health and phjsical condition, attendance, 
and school citizenship If vou would like to talk over the report, or to get more 
complete information on your boj ’s success m school we should be glad to have 
jou tome to see us If you can telephone us or send a note ahead of time, it will 
make it easier to arrange a meeting 

llie upper part of the report is for jou to keep for future reference Please return 
only the louer part We are esnecially anxious to get any information from j ou that 
will aid us in helping vour boj make a complete success of his school w ork Any 
infonnation or suggestions you may wLsh to write will be welcome 


Smrerely jours 
(Signed by teacher and 
pnnetpai) 


” Willard B !• I.hroe Pupil Projrcss m (he Elemen(an School page 76 
Uureu.i of Publication. Teachers College Columbia Umvepsitj 1941 


New York 



PUBLIC relations 


409 


I 

repoet for first half or the first 

SEMESTER 

name _ _ 

PROGRESS IN SCHOOL SUBJECTS _ _ _ is domg ...j, 

ork in 

Ills work IS good m 

Ills work IS poor and needs impro\ement m _ 

lha work in tiic'c subjects nou/d probabfj be improved if 

PiniSICAL CONDITION' Ifealtb Iiabits and conditions needing attention 

ATTENDANCE Half dijs absent NiimheT of iimea tard^ 

REMAKKS 

SCHOOL CITIZENSHIP Be bclicic that eierj boj should be happj in 
school «h(mld take part m the life of the school, should get along well 
with lus classmates, and should develop good habits of honesty, courtesy, 
neatness, consideration /or the nghts of others, and industry 

Your bo} IS especially strong in 

He could improve in — - 


Tear ojf here and return this part of the sheet 


I have examined — 's report for Septonber and October 
Signed 


(Parent or guardian) 


REMARKS OR SUGGESTIONS 


Figure 51 A Suggested Informal Report to Parents (Alter H.U) 



410 


MEASURCUENT IN INSTRUCTION 


Suggestions for letters to parents The art of writing effective letters 
to p'vrents vvnll require special training and practice To a'^sist teachers in 
acqumng this necessary skill, the schools of Santa Monica, California, 
prepared a V erj helpful list of suggestions** The list m somewhat abndged 
form IS as follow s 


1 Begin the letter with cncounging ne \a 

2 Clo^e w ith an attitude of optimism 

3 Solicit the parents cooperation m solving the problems if an> exist 

4 Speak of the child s growth — soenl oh^'^ic'il and ncsdemic 
a “^ocial (citizenchip tmits) 

(1) De«irable traits attention care of nropcrti co-ooeratton honest} 
effort fair plav etc 

(2) Undesirable traits sel'tshness wastcfuJiics? untruthfulness dishon 
e^h carclcssnc*^^ etc 

b Physical (health conditions^ 

Posture w eight \ itahtj , etc 
c Academic 

(1) Interest m «chool and extra-school actmtiea 

(2) Methods of work 

(31 Achievements (a) Growth tn knowledge appreciation techniques 
(b) list subjects m which child is making progress and those m which 
he IS not making progress, (3> relationship of his accepted standards to 
his capacities 

5 Compare the child s efforts with his own previous efforts and not with those 
of others 

6 Speak of his achievemenU in terms of his abibt} to do school work 

7 Please remember that everj letter is a professional diagnosis and therefore 
is as sacred as anj diagnosis ever made by an} ph}Sjcian 

A more elaborate 21 page manual to guide teachers in the preparation of 
parents was prepared by the Omaha, Nebraska, school s}Stem 
The Colorado experiment Although it is true that the aim of aU 
evaluation and reporting to parents is the complete development of the 
c I , It IS often necessary to “temporize ideals with practical considera- 
tions 


The experience of the Secondaij School of Colorado State College of 
PxJucation is especial]} mstructiv e « Detailed analytical evaluation sheets 
were tried and abandoned nriraanly because of the excessive amount of 
time required to prepare them The use of the terms unsahsfadory sahs 
fuctory and homyrs was gisen up because it was felt that anj attempt to 
ei aluate pupils both in terms of their mvn ability and the objectiv es of the 
cumculum IS sure to involve oegative reactions Tvaluations of the ordi- 
^ abandoned because they afford only a par 


”/bii pages 83-84 

ind UeMHuiR “ Secondary-School Experiment in Markn g 

1937 ^ '^Kmal Adminvlraiion and Supemston 23 481 500 October 



PUBLIC RELATIONS 

the attempted and discontinued because 

^enont r'^ ® “nd experiences instead of 

reporting an ordinary picture of the pupil’s growth and progress. Confer- 
n e meetinp of counselor, teacher, and parents, although successful for 
a time, had to be given up because of the failure of the majority of parents 
to respond to the school’s invitation to avail themselves of these conference 
Opportunities The school eventually prepared lists of “statements of trait 
actions ^Nhich ^^cre indicative of the pupil’s attainment of such general 
school objectives as self-direction, social adjustment, breadth of interests, 
personal attractia'encss, care of materials and equipment, basic reading 
skills, and the like. These ^\ere then evaluated on a five-point scale, H,S, 
A , f/,0, indicating distinctly superior, satisfactory, needs to make improve- 
ment, unsatisfactory, and no evaluation, respectively. 

The experiment continued for many years Wnnkle*® summarized the 
program as follous; 


In tho thirteen 3 ears n*hich have elapsed, aeir forms and nen' practices have 
been dc\ eloped, tried, scrapped, and replaced by nener forms and practices 
Detailed ana!} deal reports, scalc-t)7)e eialuations. the conference plan, anecdotal 
reports, and check-hst t}7>e reports aere developed and discarded because they did 
not <lo a good job of com ejdng information or demanded too much time 
Hepeatedly it nas discovered that adequacy meant detail and detail meant 
forms which were impractical for uso in public school situations One entenoa 
w hich resulte<l in the scrapping of many forms and practices including those which 
were successful in their uso in the iaboratorj school was h’fwlciens deieloped must 
be tiscibU in ihe public schooU by public school teachers 


In Maj', 19-15, a popular referendum was belli in which all high-school 
students participated; the general consensus was highly favorable but sev- 
eral changes were proposed For example, 99 per cent of the students 
thought they should always be allowed to see tlieir scores on standardized 
achievement tests; also 90 per cent of the students thought that the reports 
to parents should show bow the actual achievement compared to the ex- 


pected achievement. 

The University of Chicago High School System of reporting. The 
University of Chicago High School plan illustrates a dual system of rc- 
portins At the end of each semester the parents receive a detailed report 
in terms of the specific objectives of each course and whaterer comments 
are deemed neces-sary. A week or so after the detailed reports are sent out 
and the parents have had an opportunity to study the strengths and weak- 
nesses of the pupil, the course marks are fonvarded and arc usually accepte 
hy the parent as incidental supplemental- mformat.on RP™ ^ 
trates one of the detailed semester reports m social studies The Chicag 


n WillLSm L Wnnkle, '■Ueportma I-up.l Progress," m.calwml UMw, 2 293- 
295, April, 1945 



412 


ML \SURLVE\^T IN INSTRUCTION 

THE LNnXR'^IlY OF CHICAGO 
The Laboratoij Sch<»ol 


SEMESTER REPORT, SOCIAL STUDIES III 

Student - Dite 

1^1*1 Name Fir'l Name 


Purposes 

Rating 

Comments (if an\1 

1 AcquPition of ba«ic inforrmtion 



2 Reading 'kilU 



a recognizing mam ideas 



b recognizing pertinent d ita 



c eocial studies \ ocabularj 



3 OralSUlL 



a presentation of ideas 



6 organization of ideas 



c adequaci of content 



4 \\ ntmg Skills 



a organization of idea« 



6 adequaca of content 



5 Abibta to interpret «ocial data 



6 \bilitj to apph pnnciples in new situations 



7 Interest m current affairs 



8 Courtest and cooperation in group situations 



Habits of \\ ork 



9 Persutence m oaercommg difficulties 



10 Tendenc} to work mdepenclent!\ 



11 Promptness m completmg work 



12 Application during «:tud^ 



13 Attention to ela<5s actmties 



14 Participation m class actmtie« 



lo tffectn ene^s m following directions 




Pupil s Grade _ 


Instructor 

Figure oo A Report Card C-ed at the University of Chicago High School 


















PUBLIC RELATIONS 

E Other Atcnucs of Puhlir Inromiotion 

School cxlnhlts. There is no sounder pnnnple of valuation than that 
contained m the statement. "By thetr fruits j e shail knon them " Exhibits 
afford one of the best naj-s of presenting the “fruits" of the school The 
public IS cndcntlj interested in local, count}, slate, national, and infer 
national fairs and exhibitions of all types, and schools could make use of 
this fact Faslera and displays of pupils’ «ork, as uell as public programs 
of a dramatic, literaiy, or musical character, afford concrete demonstra- 
tions of tho school’s educational program Commencement programs in 
Mhich the pupils themselves play the leading roles afford an eicellent op- 
portunity for the public to see the end products of the school In the final 
analysis, honover, the ordinary everyday behavior of the pupil is the best 
evidence of the worth of the school What the pupil Ihmh and iihat the 
pupil say! are both unportant, but what the pupil i« speaks a still more 
eloquent language 

School Msilaljon. Vicanous Knowledge is important, but it is usually 
ft poor substitute for first hand cvppnence U'heneier possible therefore 
tho publir should ha\c an opportunit> to ^ their school m actual opera 
tion The school should cultuate a reputation for fncndhness The an 
nounced policj of tho school should be ‘ The latch string is aluaj’s out ” 

It IS a rare parent indeed who would not rather see his own child “■perform" 
(hun witness world famous actors on television Furthermore to observe 
the process of upholstering a chair or fashioning a dress is inherently more 
interesting than merelj to look at the finished product 

TJic parent-teacher a«>ociation. The modem educator recognizes 
more dearly than did his predecessor that education is a continuous umfied 
procc'hS, that several agencies contribute to its accomplishment, and that 
of these the home and the school are most important It is self-evident, 
therefore, that there should be intelligent and wholehearted co-operation 
between the liome and the school The local parent teacher association 
seeks through mutual understanding to effect this needed co-operation 
At Its best, the asaociation is a modem sucte^'^or to earlier visits of teachers 
to the pupils’ homes and of the parents to the school, both of which are 
me reisinglj difficult with the grow lb of the school population and with the 
enlargement of the area ■•erved by thi individual school 

hrom the viewiiomt of the home the assocwtion affords an opportumtv 
for parent s not only to hear about the echooFs program and pblo^phy and 
to see the school in a<tual operation but also to to 
wnd see The modern pwrent like the modern ohild wants to be hrard a 
well as seen Certainly at all times he is entifW to eommiimcat^ m 
an accepting atmosphere \ free mten h mge of feelings and ideas may ic 



4ii 


MFASLRrMEi\T INSTRUCTION 


facilitated bj the u^e of group techniques such as role plajing sociodraim, 
and Ieaderlc«s group di‘=cu‘'sion ^ 


F Mobilizing Public Opinion 


Sampling the opinion of parents To wliat extent can the judgment 
of parents be utilized in the malimtion and jmprmemcnt of the schoor 
I>lls‘^ attempted to u^c the opinions of the parents of semorb in e\ aluating 
the ^econdarj schools attended bj Ihcir sons or daughters He emploied 
a five-point scale ranging from “extremely satisfactorj ” at one end to 
extremelj un^atisfactorj ” at the other Twelve items were included; re- 
lating to the general quality of instruction dev elopment of good character, 
training in good citizenship, guidance actu ities, and tlio like The principal 
of the school personallj signed and mailed to the parents of seniors in his 
*^chool a double postal card containing the following message 


To the ParenU of Seniors 

Our school has been •^elected as one of two hundred high schools and other 
«wondarj schools in the United States to be cnticalb studied and evaluated in an 
^ort to improve the standards of «ccondarj education throughout the countn 
ihe «tud> IS not connected m anj wa> mlh the Federal Government 

national 8tud> calls for a frank evaluation of the 
« T of Paf«nts Wc arc asking parents of our seniors 

opinions concerning certain aspects of our school as judged 
to elnrp«\i?. <lunng their school life here ^ou are ur^ed 

not asked wtber^n Judgment whether it is favonble or unfavorable "iou are 

to judge It The card need 

W.uihmgt^ «« “> 

this ^ response from the parents of pupils m 

two pt“It “ pttmptl'? .thin a day or 


deme concluded that “the parents, on the whole showed a marked 
the scalp ^ Judged by the scattering of the ratings along 

fnd more ^h^' 7 "exceedingly satisfactory," 

nlSto ““ T.”" -tisfaotory" or “exceedinliy 

Xo 1 T, ?o ? thnt ‘he gutdance program of the typical 

Sel ?et neA ealrsfaotoiy, a judgment supported by other 

for Its success nnon^n ”” f school program is more dependent 

wteto nareMrore “ c<M,^rat.on than is guidance “Regardless of 
what these ludmeiit'^”™’' Judgments it is unporfant to know 

judgments are Bells pornts out ' tor m the last analysis the 

m the Teaching SitotSSi°™|S h)yn“™'« «' Psjchodraina 

'Human Dynamics in the Claa^™ i 1848 Herbert A Thelen 

and William aark Trow and other. ® ^ 

cadonal Psycholog j 41 322 338 Uctobe^ of Grouo Behavior Journal of Edu 

** ftlter Crosby Eells Jiidffm f r u 

Schools School and Society 46 40^16 Amenean Secondary 



415 


PUliLIC RELATIONS 


“f ™"‘™' Another .nter« emphas«es the 

«r=rf,onk “f ‘‘“O'" "'■ot the pobhe Imm about 

Its cl ools jt .s eten more important to learn tvhat the public hds about its 


Concliiilins statement. Jt is one of the fundamental beliefs of a de- 
mocracy that reliance can be placed on an enlightened pubhe opinion It 
IS to achieve this end that public schools are maintained But it is erro- 
neous to assume that the rc-^ponsibility ceases ivhen the formal penod of 
instruction ends In a changing world the continued enlightenment of the 
adult population is increasing^* recognized as a major responsibiht^ of a 
democratic societj Xo individual or group can be expected to think or to 

act intelligently on an} thing without the necessao information Tosupplv 

this information about the schools is the objective of the public relations 
program At all times the school mil do nell to keep in mmd the woids 
of one of America's ablest statesmen, Abraham Lincoln 


Public sentiment is ever) thing UTith public sentiment nothing can fail, with 
out it nothing can succeed Consequent!} be who molds public sentunent goes 
deeper than be «ho enacts statutes or pronounces decisions 


Sflfcted Befejiences for Further Reahing 

h Isbrec, Willard S , Pupil Prepress tn the Elemenlary School New York Bureau 
of Publication^, Teachers College, Columbia Unnersit) 1943 Chapter VIII 
Ev ans, Ilobert 0 , Practices Trends, and Issues m Reporting to Parents on the 
Welfare of Pie Chid \n School New York Bureau of Publications, Teachers 
College, Columbia Unucr^itj, 1938 93 pages 
Frochlich, ChfTord P , and Darley, John G Studying Students, ffuMfancs Methods 
for Individual Analysis Chicago Science Research Associates, 1952 411 pages 
Rothne), John W if, and Roens, Bert A, Guidarux of American YouPi, an 
Experimental Study Cambridge, Massachusetts Harvard Uni\ ersity Press, 1950 
2G9 pages 

Scott William 0 , Desirable Objectwes for Public Schwas — ^A« Opvmon Analysis 

Unpublished PhD Dissertation, Geoije Peabody CoUego for Teachers Nash- 
nlJe, Tennessee, 1951 236 pages 

Smith Euaene R , Tv ler, Ralph W , and Staff, Appraising arid Recording Student 
Progress, New York Harper A Brothers, 1942 Chapters IX~XI 
S> kes, Gresham M , "The PTA and Parent-Teacher Conflict ff anard Educational 
Peweto, 23 86-92,’ Spring 1953 

Thorndike noherth , Personriel Selection, Test and Measurement Techniques New 
yS John Wiley 1 Sons, 1949 Chapter H, "The Personnel Selection Program 

rrlflrto E , Mn.,nrs New Pork Harper i Brothers, 1945 

W™Ue,'mhl L , and ® 

DenioOTeji New York Farrar and Khmelran, 1!M. Chapter 49 

c Seyfert, ' What the Fubhc Thmks of Us Srhools '• ScM 4S 

417 427, June 1940 



17 


Some Present Trends 


In the preceding sixteen chapters and in the si\ appendixes on pages 
429—465 a multitude of measurement problems receue attention It seem" 
desirable, ne\erthelcs", m this final chapter to pre«cnt a brief o\erMew of 
current trends 

Reliability. The eTtreme emphasis upon high reliability coefficients 
■which characterized educational and psychological measurement during 
the 1920’s and 1930 s has died down, though when a decision concerning 
an indmdual’s future status in a certain trait is being based upon a single 
test, considerable stability is needed For predicting a criterion, several 
well-constructed but «hort and hence only moderately reliable tests are 
usually better than one relativ ely more reliable instrument The short tests 
should correlate with each other as near zero as possible, but each should 
correlate well wath the entenon to be predicted 

Spuriously**’ high single-form reliability coefficients may be obtained by 
two different methods, administenng the test to an extremely heteroge- 
neous group or appUnng a split-half or Kuder-Richard'on computational 
procedures to a highly speeded test If the exanunees upon whom any relt 
abihty coefficient is based ha\e more \anable scores (a higher standard 
de\iation) than your testees, then the reliabihty coefficient secured for 
your group will m all likelihood be lower than theirs 

aliditv The Amencan Psychological Association’s Committee on Test 
Standards lists four types of y ahdity ‘ 


Cronbach (Chairman) Technical Recommendations for Psjchological Te^ts 
To o Proposal Amencan Psychc^si 7 461-47o 

of the Amencan P»ycAo/£«n«l and 

the Amencan Psjchological Association 


41C 




417 


soMF ri{r^r\T jrends 


Icno/m™,:;:.' to t «"d cn 

"^‘"oon fto test .nj e„„ 

for" m'^fteTc Ust ^oZuHZT^’' oalW 

; ; , " . iraiiiinE or some similar actiutj in acailcmi. 

1 mcicmcnt test is most oftcii ciiatmiied for content laliiliti 

str’teVnTsT™^''; constnict] >ol. til, rs ostablishc! iihen the miestigator demon 

I ctilwn sco™l ? a “''“"'"E corresponds e 

I etneen scores on tiic test and other indicators of the state or attribute 


The ComniJttco stites that “the [test] manual should make dear ^^hat 
t\ pc of inference the \alKjation study supports Ko manual diouM report 
that ‘this tcat IS \ahd ’ In the past evidence that is not appropriately 
tcnrud evidence of validity has been presented in the manual under that 
heading The test user should ask himself and the test salesman \ alid 
/or Wmi;” Tor instance docs the test predict success m the first yes’- of 
college rca«onahly well’ Docs it correlate substantially wnth current level 
of a‘>pintiori’ Is It ba«cd upon a careful sampling of the content and opera 
tions in a given set of tovtbooks course units or syllabi’* How high) 
does it correlate with similarly named tests’* Of course few tests are v alid 
in nil four of the above senses but the u^er will want to he sure that the 
test has the kind and degree of validity he needs 
The criterion problem In recent years the thing to be-predicted ha® 
been showTi to be of crucial importance, since even the best possible test 
cannot predict an extremely faulty entenon well The entenon may not be 
reliable enough, it may not be completely relevant and it may be imme- 
diate or intermednte, when some more ultimate behavior needs to be pre 
dieted * As an example, take an attitude mventoiy which attempts to get 
at “good citizenship ” No matter how carefully constructed this instrument 
15, scores obtained on it by the pupils in a given class will not correlate well 
with citizenship ratings assigned to them Furthermore the school is 
probably quite interested m the adult citizenship behavior of the former 
student so unless the immediate entenon— the teacher s ratings- ib highly 
correlated mth the ultimate entenon the status v alidity of the inventory 
may difTer considerably from its predictive validity 

Ratings may usually be made more reUable by having several well m 


1 Ur. Dn !)»• Validity of Educational Teats Hanam hducaiwnal 

b JS fe "29'’o^29o“o?.ob?? mlTlra^bte Uee .. Te.t S.mce Notebook No 3 
with 

logical Mmaremenl 12 707 719 „ particularly applicable to 

‘ For a comptehena re AscMslon of OT ^ j6 „E F Lrndquist (Fditor) 

education ace Edward E Curelou '«'■*'» Council on Education 19ol 

Ed co/ionul McamrmmI Waaiunglon D C Atitencan Vcoun 



418 


UEiSUREUENT IN INSTRUCTION 


formed persons rate each individual and then take the mean of their ratings 
If the raters have not all h id ( onsidcrable opportunity to observe the ratccs 
with respect to the characteristn, being rated, lioisevcr, this process may 
result in some loss of rele\ance 

Nearly all of the entena used m predictive validity studies arc inter- 
mediate buccess or failure in medical school, rather than competence as a 
practicing physician, passing or failing in flying school, instead of achieve- 
ment ill combat, grades in the teacher training curnculum, not perform- 
ance on the job ten jears after graduation, and score on the final training- 
school exam m lieu of competence as an automobile mechanic Indeed, 
most “ultimate” criterial measures arc hard to get, all too unreliable, and 
of doubtful relevance This is illustrated rather dramatically by the numer- 
ous attempts to determine what a “competent teacher” is ® 

Factor analysis. Since the early 1930’s an increasingly large number of 
measurement specialists have v\orked both theoretically and practically 
with factor analysis Thewcll knownPnmar> Mental Abilit}' tests (PMA)^ 
had their ongms m factor analyses performed by Louis L Thurstone, 
whereby he used mathematical methods to identify u few “factors” (Verbal, 
Word Fluency, Number, Space, Memory, Reasoning, and Perceptual 
Speed) which accounted for most of the positive correlations among a large 
number of mental tests 

The Holzinger Crowder Uni-Factor Tests for Grades through XII 
are based upon factorial studies and contain verbal, spatial, numerical, 
and reasomng subtests They first appeared in 1952 ® 

Factor analysis has also been used frequently to provide information 
concerning what a test battery such as the DilTerential Aptitude Tests,® 
the Wechsler Intelligence Scale for Children (WISC),‘° or the Revised 
Stanford Bmet" is measunng 

Achievement vs. intelligence vs. aptitude tests. The old familiar 
classification of abilitj tests into three types, achiev ement, intelligence, and 
aptitude, has been challenged severely b> correlational studies This is 
especially true of the eight Differential Aptitude Tests,** which include all 
three kind s Verbal Reasomng, Numerical Ability, Abstract Reasoning, 

* Simeon J Domaa and David V Tiedeman * Teacher Comoetence an Annotated 
Bibliography Journal of Experimental Education 10 101-21S December 19o0 

’ Devised by Louis L and Thelma Gwinn Thurstone and published by Science Re- 
search Associates 


* Devised by Karl J Holzinger and Norman A Crowder and published by the World 
Book Companj 

• Jerome E Doppelt The Orgamzahon of Menial Abilities in the Ane Range IS to i7 
^ntribuUons to Education No 962 New York Bureau of PuMications Teacher* 
College Columbia University 1950 86 pages 

“ Llizftbeth P Hagen A Factor Analysis of the Wechxler Intelligence Scale for 
Children American Psychologist 6 297 July 1951 Abstract 
“ L>le V Jones A Factor Analysis of the Stanford Binet at Four Age Levels ’ 
Psychometnka 14 299-331 December 1949 
« Abbreviated DAT designed for Grades 8-12 and published by the Psychological 
Corporation •' 



SOME PRESENT TRENDS 


m 

^ dL^fn^ "TT^ objective attempts 

to uiio\s for cjuilc different norm groups 

tl ’■‘S'' rorrelations of 

me U yrba! ncasonmg ami .Numcncal Ability tests mth intelligence 

tesHtnt apparciitl> can sene most purposes for which a general 
rnentnl abiht> tost is iisiiall> gi\cn in addition to prodding differential 
dues useful to the counselor Hence the use of the so-called intelligence 
test is npparcntlv unnecessary where the Dijferenftal ApUtude Tests have 
alrcad} been u‘'cd ”” 

DifTcrcnlial prediction A persistent guidance problem, still large!} 
unso!\ ed, is to estimate diflfcrcnlial success in a vanety of fields Will John 
“make” a bettor engineer than law-j er? According to hs high school grades 
and mtdhgcncc test scores he would probably pass cither curnculura m 
college Can the counselor organize all the information concerning John 
in a waj which will enalile the counselor to predict with a fair degree of 
confidence tliat success in one college field ismore probable than m another^ 

Vs currently attempted the solution is attained largely bj rule-of thumb, 
“common •itnso” procedures which rely hcayil> upon intuition and “ana 
chair validity ” If John has high mechanical and scientific mterest scores 
on the Kiidcr Preference necord and high Numencal Ability, Space Rela 
tions, and Aledianicul Keaj>oniiig scores on the PAP, while he is somewhat 
lower on t!ie Kuder persuasive categoiy and the DAT Verbal Reasomng, 
Abstract Ucasonmg ami I anguage Usage tests, very likely he will be 
counseled toward engineering instead of Ihw 
T his IS an unsatisfactory method however, bijicd it relics too heavily 
upon asAWnrd validitv and subjective weighting of thi vanons test stores 
to arrive at n "felt" probaliilily of success in one field venous the other 
Tor some time at if isticians have been evolving methods of profile and dis- 
crirainatttr V anal} sis to make differential prediction objet tive I hough this 
htentuie has barely touched educational measurement yet it does seem 
10 have guat importuntc for counselors Perhaps the most easily under 
itood articles for th^ interested student to read are Tiedeman's aud 
Uuloti’*' '* 

“OoortecK Tlimictt llar^HO St-MOifn. and Alexand. r G Wwmsn ^ 

Now York I svcholoijioal Cor 

’’"“d™, J v'"i « .> ri.o m.i.t, .f iho I.™™™- 1 ru-.cto„ 

t irtmniJ I smMon ^ ‘’1’™^ **“’* 


120 


MFA^bREMFNT IN INSTRUCTION 


The ‘S*liole” child Emphasis has shifted somewhat from stnctl} 
objective measurement of speciBc traits to co-opcrati\e evaluation and ap- 
praisal of the “whole” child Such abstract charactenstics as “respecting 
the nghts of others,” “participatirg democratically in group activities,” 
and “developing habits of good citizenship” occupy the attention of 
teachers bent upon comprehensive evaluation Grading each child in rela 
tion to the class norm is minimized m the “modem” school, where pupils 
compete with their own past records 
To a considerable extent this “vvhohstic” approach is congruent with 
developments m educational philosophj and psjchology since 1925, though 
at times it has resulted m a flight to complete subjcctivnsm, with consequent 
abandonment and ndiculc of objective tests Some teachers even take the 
netting up of objectiv es to be sjuionjunous with evaluating thc'^e objectives 
Cureton>* and Itulon” have called attention to the extremely loose think- 
ing involved m much current “evaluation ” A quotation from the former 
'ets forth this point of vnew 

\mong the abstractions which wc must oi present consider intnnMC lUv invalid 
we find most of the action 'cnes that go to make up “worthv home membership 
good citizenship" ‘democratic attitude Mojallj and manj of the other 
ultunate aims of education On the other hand ‘command of fundamental proc 
e^'ses does lead to e«sential agreements A\e can fairlv well specifj the acts per 
formed in appropnate situations bv per«ons to be designated as luvnng such 
command’ the acts performed in similar situations bv persons who arc to be 
labeled as bcking such command ' and the materials upon which the acts are 
to be performed and the bases upon which the acts arc to bo classified and scored 
as successful or unsucce<«ful Tho«e educators who insist (and nghtlj we behevel 
that other aims are at lea«t equalli important and m aggregate probablv much 
more important would advance their cau«emo«t rapidlj and efTectu elj b> setting 
about the task of specifiang the materials actions =ituation« and ^coring criteria 
implied b> the abstract terras which denote the«e other aims Thej wall find the 
task difficult but m most ca«es possible ^Tien the> have accomplished it thej will 
find that teachers will use the materials set up appropriate school 'situations and 
teach the desired acts In Uiosefev' cases trAere the task turns out to be impossible the 
abstract mm must be admitted to be intrinsically xniahd 


Qualitative aud semi-quantitative evaluation techniques In re- 
‘=ponse to the recent emphasis upon ev aluating non intellectual aspects of 
the child s bebavuor there have arisen sev eral new procedures Anecdotal 
records samples of rev ealmg behavior recorded shortly after their occur 
rence bj the observ er and made a part of the child s cumulative record, 
are widely used by teachers Vanous types of ratings — of one’s self, by 
peers b> teachers, and by parent<^have become popular « Some highly 


* Edward F Cureton op at 

Edward E Cmlon op ctl page 6o2 Italics added 
qianw f Compared with teat .core. a> m the atudy bv Julian O 

40^ \ovcmber 19 “ 1° °°= ' ”/ Educalwool Psychology 42 3M 



SOME PRESENT TRENDS 


421 


refined standardized rating scales lia\e appeared, but the majonty have 
been prepared locally Chock lists arc used frequently, too ” 

Somcuherc bcUccn the tests of ability and sheer rating scales are the 
numerous inacntoncs nhich employ forced-choice methods The Allport- 
Vemon-Lindzey Studj of Values, illustrated on pages 176 and 190, consists 
of questions to which there are no objectively “right" or “wrong” ansn ers. 
Tlic Kuder Preference Record has tnad items, each containing three ac- 
tiv-itics from \\hich the examinee is to pick the one he likes most and the 
one he likes least, obx loosly, this is a 1-2 3 ranking arrangement Neither 
of these inrentones nas prepared m a purely subjective manner, for both 
bad to meet certain statistical entena , , , , , 

Various sociometnc techniques have been devised for disclosing rela- 
tionships among members of a group or betuecn groups Frequently these 
of 11, e “Which five children in the class would you rather sit next to 
arc of the Which I ^ information concerning social 

typo By tins means them to individualise 

molates and PoP”'" sociofogieal innovations which may at 

;reTb?JXtrcaU aro r.e p^^ng, psy—, and^^^ 

in public schools ” recently were used chiefly mth neu- 

Projectito techniques, nhich ““ "jes been adapted to the study 
rotic or psychotic “auto, have m instrument allows the subject 

of relatively normal “hdaren A P J f^^trations in a partially un- 
to "project” his anxieties, fea , P^“^^ u thus revealing these 

structured situation where he j may be a particular set of 

inner feelings The specially prepared amb.guo.^ 

inkblots, as m the n<>^“hac ,j,,gnint.c Apperception Test TAT , 

pictures of human beings, Chddren’s Apperception Test (CAT) , 

selected pictures of animals, as ^^tory Test, Draw a-Person Test, 

incomplete sentences. Make a discussion of the “Development 

doll play, and -^.^teSrof P^onality,” including a 94-item 

and Applications of Project 

■ , to knew more about raU-f 

i.The student uh“ Robert A i°953 362 page, 

lists may consult Arvil S “art, ^ g bipp'ncott ^ 132pages 

Keiearchand Columbia Dniversitv 1951 

.• See Georgs Sharp turn Teachers CWW personality and 

New York follow up atudy/" „t FSendship Cho.CTiS 

Arthur Singer 'cport® ^ Modes and Ctonsta^y p„i,„ent A 

Their Relation to Certa G 1“““’ “ 30-65, ho 2, 

nal of Educational Researc , 0 jouma ] , , , 

Thelen, -'Human ^ f 

Group Discussion. Joum 



422 


UEASURmiLNT m IXSTRUCUON 

bibliography for 1949-52, sec Rothncj and Ileimann =” A severe limitation 
of projective techniques for the teacher or administrator is that, not heing 
tests in the usual sense, they require tor proper administration and inter- 
pretation tar more clinical training than the iionspeciahst can hope to pc- 
quire Especiallj in this area a little knowledge can be a mighty dangerous 

*'lSmel tests and items. Some of the newer tests of “general mental 
ahihU ” are mentioned hy Stanley => These include the multi-score Weehs- 
ler Intelligence Scale for Children (\\TSC),*‘ an indiv idunl test w Inch yields 
t^\o separate IQ’s, performance and \erbal, the Arthur Adaptation of the 
Leiter International Performance Scale,” an untimcd test given VMthout 
verbal instructions vhich should be useful for testing joung children with 
phjsical and linguistic handicaps, the Northwestern Intelligence Tests, 
developmental scales for infants 4-3G uccks of age that jield IQs,” the 
Da\ns-Eells Games for Grades I-VI, meant to bo “fair" to children from 
all socio-economic levels,” and Goossen’s ingemouslj disguised si\-item 
intelligence test for public-opinion pollsters, which masquerades as an in- 
terview measure of knowledge of current events ” 

A widespread recent effort, enci^zcd b> the Progressive Education As- 
sociation’s eight-year study,” has been to measure understanding rather 
than merely memorization The Forty Ftjth YcarbooK of the National Soci- 
ety for the Study of Education, Part /, entitled “The Measurement of Under- 
standing,"” represents a systematic attempt to outline principles helpful 
m constructing tests that tap this “higher" t j pc of know ledge Enough has 
already been done wath such instruments as the PE/V Interpretation of 
Data Test, the Cooperative English Test C (“Reading Comprehension”),” 
the Tests of General Educational Development (GED),** and the Watson- 


’*Johii V\ M Rothney and Robert A lleimann, Review of Educational Research 
23 70-84 February 19o3 

“Julian C Stanley Development and Applications of Tests of General Mental 
Ability Review of Educational Research 23 11 32 Februarj 1953 

** Devised bj David W echsler and published bj the Psj chological Corporation 
“ Devised by Grace Arthur and published by the Psychological Service Center Press 
** Devised by Adam R Gilliland and published by Houghton MiRlin Company The 
use of IQ s for children less than four or five years of age is open to serious question 
however 

Published by V, orld Book Company For mention of the studies underly mg the'e 
tests turn back to page 278 

** Carl V Goossen The Goossen Hidden Intelhgentc Test Public Opinion Quar 
lerly 14 759-766 W inter 19o0 

*’ Eugene R Smith Ralph W Tyler, and staff Appraising and Recording Student 
Progress 550 pages New York Harper and Brothers 1942 
” Chicago University of Chicago Press 1946 338 pages 

*> Devised by Frederick B Davis Harold V Kmg and Mary W illis and published 
by the Cooperative Test Division of Educational Testing Service 

” Prepared by the Examinations Staff of the United States Armed Forces Institute 
and distributed by the Cooperative Test Division of Educational Testing Service 



soMn pnrsrifT rnrffDS 


423 


Glaser Cntical Himkmg Appraisal” to indicate clearly that a frightened 
retreat to the essay test is not the only nay to measure understandmg, or 
indeed the most desirable one hen properly constructed, ohiective tyve 

Items measure much more than merely recognition For many purposes 
non scraed hj cssaj and completion tests, objective-type tests would be 
more suitable ./ prepared m accordance with well known measurement 

principles This is not to denj that as often constructed objective items- 

ploriiig the -"'P'-'rj te“ ‘enS 

better than chance alone, acquiring additional information 

ihanco Tlie educator is concerned with acqmring 

upon which to base ^ , shall I concentrate upon helpmg 

reading’ Should Jean tab already doing well enough’ 

loo adjust better ‘o ^ P’ decision centers around how much 

A crucial aspect of making an ^ 

information one already Ins mu admmister an inteUi 

can contribute For mstarice, the average correla- 

gence test to his class’ « <ind the pupds' 

tion bet^veen teachers interview estimates of mtelli 

intelligcnce-tcst scores to be psychological Examination 

genco^correlatcd Test, wbla the 

and CO with the 77 Thus if the teacher has plenty of 

ACEPE and the OSUPT using an intelligence test to gam 

testing time available, he can |,s pupils that he does not 

some information . us„„,re easily otherwuse, but the mcr 

already have and f ler hand test time - 

ment mas not be large “ ^u„ster a test of some 
usually is), he may ‘“,”7 ,^33 rehable and less valid than 
characteristic, even thougn 

Glaser and publ.hed by 

« Devised by erhoaru and UtiMy 

for Psycho netneProhUm Making Psjchdogical Bulhlm 19 o 4 

of Education University oiiu decision Making 

Tc,ta-Do They Agree’ Educe, ^ U, jrtennew Educal,o«d arui F,y- 

'°?josepbV Hanna 'AutlSm U>5» 

chdogccol Memurement 10 



424 


MlASlI{JMi:i\T IN INSTRICJIOY 


Suppose, for example, that the correlation beti\ccn the teacher’s esti- 
mates of his pupils’ mental health and the best a\ ailablc criterion of mental 
health is only 05, while the mental health test correlates 30 with this 
entenon Then, even though the test has what is usually interpreted as 
low validity, still it contributes to the teacher’s very meager initial in- 
formation concerning the mental health of his students Therefore, he 
would appear to be well advised to give the mental health test in heu of 
the mteiiigence test if both compete for the same time, particulaily vrhen 
important decisions having to do with the mental health of the pupils 
must be made This approach stresses two considerations, the accuracy 
of the judgment that can be made without the test and the importance 
of the area tested 

Thus the benefit from a test is not a function only of the test itself, but 
also of the decisions to be made with it The test is just one step toward 
the goal of efficient decision making In the classroom situation decisions 
can be changed as further information is acquired Viewed from this stand- 
point, deciding tentatively on the basis of prior evidence and a low score 
on a mental health test that Bill is having adjustment difficulties does not 
classify him irrevocably as maladjusted With its validity coefficient of 
only 30, the test wiU yield quite a few “false negatives,” persons who 
score low on it but arc not poorly adjusted These will be discovered bj 
the alert teacher during further screening, when he works with all low 
scorers more closely than heretofore 

It IS important to measure a variety of characteristics, even though some- 
what inaccurately, to know your nsk, and to follow through with subse- 
quent checks Interviews, essay tests, and projective tests are not nfles 
aimed at a narrow target, rather, they are sawed off shotguns spraying 
rather wildly but frequently hitting the mark, while at the same time 
nickmg some innocent bystanders 

It is too early to tell how much this promising-appearing application of 
utihty theory will contribute to measurement The interested reader may 
follow developments in the journal literature by means of subject and 
author mdexes m Psychological Abslrads 


Selected Refehences for Further Reading 

w of P’H'f^ogical Scahns Ann Arbor Engineering 

Re.earehBnEetinNo 34, University of Michigan, iLj, 19 o 2 94 pages 

WaiT tv'*'/ TO ' Compulsions of Factor Analysts,” Tes! Sernce 

NMook No i, World Book Company, ( 1949 ) rvnaijsi , 

to ' 4 fi° Tb "’b fer Guessmg,” pp 1^ ,n Test Seem Bulk 
(m fvo 4C, The Psjehologieal Corporation, Janilai^, 1954 Free 

''' 1 Bo>earch Methods tn Soaol 
The Dryden Press, 



SOMi: PRl SUNT TRLNDS 


425 


Lord, Frcdcnr, “A Tlicon, of Test Scores/’ Psychometric Monograph No 7, 1952 
S-1 Paiges 

“Toclinicnl Recommendation'? for Pajchological Tests and Diagnostic Tecbmques,” 
Psychological liullclm, 51 1-3S, March, 1954 



424 


ML {SlIiLMi:\T JN INSTRUCTIOX 


Suppose, for example, that the correlation between the teacher’s esti- 
mates of his pupils’ mental health and the best available criterion of mental 
health is only 05, while the mental health test correlates 30 with this 
criterion Then, even though the lest has what is usually interpreted as 
low validity, still it contributes to the teacher’s very meager initial in- 
formation concerning the mental health of his students Therefore, he 
would appear to be well advised to give the mental health test m lieu of 
the intelligence test if both compete for the same time, particulaily when 
important decisions having to do with the mental health of the pupils 
must be made This approach stresses two considerations, the accuracy 
of the judgment that can be made without the test and the importance 
of the area tested 

Thus the benefit from a test is not a function only of the test itself, but 
also of the decisions to be made with it The test is just one step toward 
the goal of efficient decision making In the classroom situation decisions 
can be changed as further information is acquired Viewed from this stand- 
pomt, deciding tentatively on the basis of prior evidence and a low score 
on a mental health test that Bill is having adjustment difficulties does not 
classify him irrevocably as maladjusted With its validity coefficient of 
only 30, the test will yield quite a few “false negatives,” persons who 
score low on it but are not poorly adjusted These will be discovered by 
the alert teacher durmg further screenmg, when he works with all low 
scorers more closely than heretofore 

It is important to measure a vanety of characteristics, oven though some- 
what inaccurately, to know your risk, and to follow through with subse- 
quent checks Interviews, essay tests, and projective tests are not rifles 
aimed at a narrow target, rather, they arc sawed-off shotguns spraymg 
rather wildly but frequently hitting the mark, while at the same tune 
mckmg some innocent bystanders 

It IS too early to tell how much this promising-appearmg application of 
utility theory will contribute to measurement The interested reader may 
follow developments in the journal literature by means of subject and 
author mdexes m Psychological Abstracts 


Selected RE^ERE^CES for Further Reading 

Cciombs, Clyde H , A Theory of Psychological Scaling Ann Arbor Engineering 
Re'earch Bulletin No 34, University of Michigan, May, 1952 04 pages 
Cureton Edward E , ‘ The Principal Compulsions of Factor Analysts,” Test Service 
A olebook No 4, World Book Company, (1949) 

Dopp^t, Jerome E, “Ihe Correction for Guessing,” pp 1-4 m Test Service Bulk- 
tin No 46, The Psjchological Corporation, January, 1954 Free 
Jahoda Mane, Deut'ch, Morton, and Cook, Stuart W , Research Methods in Social 
Reklu}m, wM Especial Reference to Prejudice New York The Dryden Press, 
19ol Barts One and Two 



soMi: PitLsuiyr trijnds 


425 


Frederic, "A Tlicor) of Test Scores, " Psychometnc il/onojrapft Xo 7, 1952 
S4 pngc^. 

“Technical TJccommciulations for PsjchologicalTe'ts and Diagnostic Techniques,” 
PsycJioIoyKal PuUchn, 51. 1-3S, March, 1954. 



Appendices 



APPENDIX 


A 

Fifty Questions to Help You 
Learn Statistics 


knonWgc of the nnlcrnl '" Chap onsoenng them Rather than 

meant to be eorreet Consult jour Iwk Ir« of paper and 

ante m jour book, copj cornet o'pt.on (A, B. C, D, or £1 

put after etch number the ‘ they will probablj not merease 

Unless you work on the quest, one „„ ^ell as jou possiblj 

jour undcislandms 'en and eeplanaUons appear Your ^r 

can, turn to pigcs ■IJO-dOS, where an e a„a.liaif 

eenlage score, corrected for chance, equals 
of the numlier rong 


15G-I5S 


. ♦ nf the ? fonnula appears on 

. 2K - ib' explanation of the n j 


Test ScoreH* 


210-219 ■ s/a^ '15 2:/d’ - " 

Average,’ x SoxiS, 

(New ork John H 429 



4S0 


APPENDICES 


The First H of Oxe Following Quesiums Refer to the Abote Test Scores 


1 If S means "the sum of,” then 2/ — 

A -15 
B 11 
C 50 
D 239 
E 315 

2 The size of the mter\ al of each class 
m the abo\e distnbution is 

A 45 
B 50 
C 90 
D 10 0 
E 10 5 

3 The fractional class limits ol the 
highest class are 

A 309 50-319 50 
B 309 50-318 50 
C 309 95-319 95 
D 310 50-318 50 
E 310 50-319 50 

4 The midpoint of the middle class 
(260-269) IS 

A 259 5 
6 264 5 
C 265 0 
D 269 0 
E 269 5 


5 The arithmetic mean, 




t XZ/d 
N ’ 


IS 


A 260 3 
B 2615 
C 263 0 
D 264 5 
E 267 5 


6 The median (50th percentilel is 
A 258 7 
B 259 5 
C 260 3 
D 264 5 
E 267 3 


7 The mode is 
A 12 0 
B 259 5 


C 260 0 
D 204 5 
E 265 0 


8 The 25th percentile (Qi) is 
A 230 I 

B 240 4 
G 244 5 
D 248 9 
E 249 5 

9 The 75th percentile (0i) is 
A 267 0 

B 269 8 
C 270 2 
D 272 0 
E 274 5 

10 <3, the scmi interquartile range, 

Q i-Q: 

2 

A 8 
B 12 
C 17 
D 20 
E 23 

11 The 10th percentile is 
A 245 8 

B 240 0 
C 239 5 
D 239 0 
E 234 5 

12 The 90th percentile is 
A 299 5 

B 299 0 
C 294 5 
D 277 8 
E 2761 

13 0 4D — 0 4 X (90th percentile — 
10th percentile) = 

A 20 
B 22 
C 25 
D 31 
E 55 



QUESTIONS ON STATISTICS 


431 


14 Staudartl devntion = 

1 X VNlfcP - (W 

N 

A 2 
B 15 
O 17 
D 20 

r 22 

15 Inftfrequenci distnbution the size 
of the mtmal of the class whose 
lower and upoer real limits are 95 
and 19 5 13 


A 110 
n 10 0 
C 90 
D 50 
E 45 

10 In a frequency diatnbutiou the 
midpoint of the class who% lower 
and upper real luoita are 99 5 and 
109 513 
A 107 0 
B 105 0 
C 104 5 
1) 102 5 
E 102 0 

17 The Tiwm reason for eroupins^to 

n. class .nlervals as a step towarf 
carrying out calculat.ous of slat.^ 
t.cal measures by hand (that is 
mthout using » mechamcal calcu 
latorl 13 to 

A reduce the amount of labor 
B ISu'ce'the frequency of clerical 

O pe^tthecaleuWionofmcas. 

ures other than the mean 
D brmg out important trends in 

E Mefte Identity of the pereons 

tested 

purposes the /frsl step m to 

A determine ^ 

B determine the wuu 
limits of the classes 


C determme the real limits of the 
classes 

D decide upon the number of 
classes 

E select the class interval 


19 What IS the most senous criticism 
to be made of the following fre- 
quency distribution of test scores 
where the real class Uuuts are 
fractional? 


Whole Number 
Class Limits 
44-48 
40-44 
36-40 
32-36 
28 32 
24 28 
20-24 
10-20 
12 16 
8 12 
4 8 
0- 4 
-4- 0 


Frequency 

1 

2 

0 

0 

2 

6 

5 

23 

24 
37 
33 
20 

3 


?^«161 


A 

n 

C 

D 


F 


gative scores occur 
irteen classes are used 
ere are too many low scores 
c class midpoints are not tli 
ible bi the range of scores m 
;h mten al , i 

le whole-number class limit 


le 60th percentile is the point m 
listnbution 

«here a student has answered 
40 per cent of the questions m 

Smarhsthed.3te— 
the median that mcludes 60 pe 
pent of the cases , 

Wore which arc 40 per cent of 

the cases , gQ pgr cent 
below which are W) pc 
of the cases j 

above which are 60 per cen 
the cases 



432 


appendices 


21 The midscore of the followinp firorc« 

(4, 6, 7, 5, 4^ 13 
A 00 
B 55 
C 52 
D 60 
E 4b 

22 Gi\cn \rti =>1 oic« 40 US I! 4U 
4i), 03 nnd US ITow inan\ points 
differcnc* tin re bctivec ii the mid 
score and tli^* mpdiiii ulien tlio 
median i® computed from a fre- 
quenc> di^stnlmtinn of those bion> 
where the clavs mtcr\ al is I** 

A 17 00 
B 0(7 
C 0 ^3 
D 0 1? 

E OOQ 

23 The antlimetic, mean of the follow 
mg scores (4, 5, 7, 0, 4) is 

A 60 
B 55 
C 52 
D 60 
F 48 

24 The measure of central tenm nc\ t*> 
use when reporting data timcerniiip 
wages in order to avoid the iindiie 
influence of a few extreme salane^ i« 
the 

A standard deviation 
B quartilc dcmtion 
C median 
D range 
F mean 

25 rhe term ''a\erage” i- usrd m 
arithmetic textbooks refers to 

A vanabilitj 
B the mode 
C central tendency 
D the median 
E the arithmetic mean 

26 From the standpoint of statistics, 
the term that means the samp thing 
as “a\erage’' is 

A normal 
B median 


C mode 

D central tendency 
C mean 

What IS the arithmetic mean of the 
following distribution? 
bcorc / d /d 

0 3 7 0 

4 7 3 +1 

lY == 10 

A 35 
B 32 
C 2 7 
D 24 

1 22 


2S /> = (F»o - Pa) This is a measure 
of 

A xanabilitj 
B correlation 
C central tciidenc> 

D ivcraocness 
C mod iiit> 


29 rhe percentage of scores l>mg be- 
tween Q^ and the meiliau is 

\ 25 
B 14 
C 50 
D 68 

T a \anable quantity that de- 
pends upon the score distnbu 
tlUD 


30 Wliat IS the standard deanation of 
the following distribution? 

Score / d fd ftP 

4 3 

2 4 

0 3 

A 4 90 
B 4 00 
G 2 40 

n 200 

£155 



QUE&TIONS OX STA77&TICS 


31 For the following distribution, the 
quartilc de\ lation or semi inter* 
quartile range, Q, is 


Score f 

8 1 

6 2 

5 4 

3 4 

2 3 

0 2 


=> 16 

A. 217 
B 2 0S 
C 179 
U IM 
E 10-1 

32 rorthedistnbutJonintheDreceding 
item, the median is 

A 175 
B 300 
C 3 25 
D 3 62 
E 400 

33 On a test isnth a standard de\ialion 
of 20 and an arithmetic mean of 80, 
an mdmdual i«th a raw score of 70 
will ha\e a s-score of 

A -100 
B -0 5 
C -01 
B 05 

E 50 

34 BTiat rank should be assigned to a 
score of 95 in the foVomog disln 
butJon, if the nnk of Die 
score, 93, w 9’ 

Score 

97 

97 

9G 

95 

95 

95 

94 

94 

93 


C 70 
D 40 
E 50 

33 How does the mean of N con<!fru 
tive unDed ranis compare mth the 
mean of A' ranks m which oce or 
more ties occur? 

A Former is larger 
B No difference 
C Latter is larger 
1> Depends upon the number of 
ties 

E Depends upon where the ties 
occur 

30 The Pearson product moment co- 
efficient of correlation, or rj* 
inoy yjrj between 
A -2 00 and +2 00 
B -100 and +100 
C -092 and +0 92 
D 000 and +100 
L 000 and infinity 
37 A teacher computed a correlation 
coefficient between scores on a read- 
ing test and ocores on the Coopera 
hie Test of Cortiemporary Affairs 
obtaining a lalue of 92 She was 
justffied m concluding that, as 
measured by these two tests, 

A knowledge of current affairs and 
reading abihtj are clo'sely re- 
lated 

B knowledge ol current affairs and 
reading ability are unrelated to 
each other 

C knowlc Ipe of current affairs and 
reading ability are perfectly re- 
lated 

D the coefficient must have been 
computed mcorrectly 
B v\ide knowledge of current af- 
fairs IS the result of goofj reading 
ability 

38 ^Vhlch one of these r's has the least 
predictive value? 

A 91 

B 50 

C 17 

T> - 23 

B -100 


A So 
B 60 



134 


APPENDICES 


39 A student computed a Pearson 
product^moment coefficient of cor 
relation, between paired score** 
m two distributions, X and Y, and 
found it to be I 05 ^Ve are abso- 
lutely certain that 
A he has freakish data 
B he should ha\ e comnuted Spear- 
man's rank-difference coefficient 
of correlation, rho, instead 
C the means of the two distnbu- 
tions differ 

D the correlation betw cen X and 1 
13 high 

E the r has been computed m- 
correctlj 

40 If the X distnbution is divided into 
12 clashes and the Y distriliution is 
also divided into 12 classes, the 
number of tallj marks m the scatter 
diagram will be 
A N 
B 

C 12 
D 24 
E 144 

Multiple-Choice \naIogies (41-50) 

Directions Each of the following ten 
items represents an analogy In every 
case the first two terms of the item are 
related to each other in some waj and 
the third term va related «v the same way 
to one of the last five 
Examvle Shoe is to foot as hat is to 
A arm 
B hair 
C hand 
D head 
E leg 

Option Df “head,” is of course cor- 
rect 

41 Arithmetic mean is to central tend- 
ency as standard deviation is to 
A average 
B variability 
C Q 
D D 

£ relationship 


42 Qt 13 to 75th percentile as median 
IS to 

A 90th percentile 
1) 75th percentile 
C 50th percentile 
D Ihth percentile 
E 10th percentile 

43 Anthmetic mean is to <r as median 
IS to 

A Qt 
B 0, 

C Qt 

D Qt 

E Q 

44 Frenuenej distnbution is to median 
as ungrouped measures arranged in 
order of magnitude are to 

A range 
B mode 
C class interval 
D mean 
E midscorc 

45 Oj— O ils to 60% ns, for a "normal” 
distnbution, Mean ± ISD is to 

A 32% 

B 34% 

C 50% 

D 68% 

E %i% 

46 Arithmetic mean is to mode as ff is 
to 

A range 
£ median 
C midscore 
D D 
E Q 

47 Median is to pomt as standard de* 
viation IS to 

A volume 
B distance 
C square 
D score 
E area 



QUESTIONS ON STATISTICS 




48. PosU’ne correction is to direct t 
negati\ e correhtion is to 

A. incomplete. 

B. iiiconsequentCl. 

C. incorrect. 

D. inadequate. 

E. inverse. 

49. Spearman is to p as Pearson is to 
A. r 


B 

C. S 

D. M 
E Mdn 

qO Rank IS to order as score is to 
A disorder. 

B magnitude 
C median 
D rank 
E variabilitv . 



APPENDIX 


B' 


A Simplified Item-Analysis Procedure 


A. Preparing the Items 

The two characteristics usually determined for a test item are difficully and 
owmminctien How hard is the item for the group tested, and how well does it 
distinguish between the more able and the less able students? These two aspects 
of an item are n^rlj independent of each other, the exception being that a very 
easy or very hard item cannot discnmmate well If all testees mark the item cor- 
rectly , it has not separated the testees into two groups, the parsers and the failers 
1 ewise, if all mark it mcorrectly (or if only a chance proportion of exaramees 
mark It correctly), the item is non-discnmmatmg for the group 
oTiH°,n ^ paragraphs, a simple method for analyzing items is presented 

^ ^ considerable detail Preferably, the test or subtest should contain 

i ° w example, four-option multiple-choice) There should be 

f items-say, arbitrarily, 60 or more-and they should 
wnrlr-? hclf ^ ^ Substantial number of persons Thus the procedure 

tn exarnmations prepared cooperatively by several teachers and 

ckss S tests m a single school 

S.w / has some value, hovvever 

anv strongly encouraged to answer eiery item for which he has 

concemmg a smgle one of the 
Item ^ be a\ailable for every eTammee to eveiy 

This can^e^done^fa?r?'^^°ff ^ nearly as possible m ascendmg order of difficulty 
mastered LmousIv^L bee-e, or d the item has been ad- 

Subiective estimates^ nf r fT” a^**'*?' nngmal difficulty values may be used 
ranS by toe >>nsed upon the a 4 age of mdependent 

than an or Lniig made by 0^™“,!^^“^ npprosimate the actual ranks better 

117 U9cmSy‘'‘“ ™"‘ *“ '"“i «■» material on paeea 

*See the discus-sion on pages 153-154 



437 


SnirUf lED ITEM-AMALY8J5 


of the «rd, c,«,cr A .cpar.fa ansJr ker^ri ueJS '”' 

to tlTu constructed nith great care, special attention being given 

and er .r. 1 (called distracters, deco>s, or foils) It should then bo kej ed 

Sw ^ ,i U tndependenti!, b) each person helping to devise 

^ ^ . »s c^treroeb important and should be followed bj a detailed 

onfcrcnce to reconcile differences remove ambiguities and discard items that can- 
not be reii^wf profwrli Though time-consuming, a cooperative approach to test 
of increasing retmbilitj and validity and improving the morale 


^lolisticaf ilem analijits is no *ubsttluUfor meticulous care in planning, eonslrutiing, 
cnlicizinQ, nna editing items It does •'ujiplement that intuitive process, however, by 
revealing unsuspected defects or virtues of tbf* specific items 


n. A Measure of Discrimination 


After tlie test Ins been given, score the p'lpers or answer sheets bj markmg with 
a re<f pencil all items incorrectlj answered or omitted fterause of the instructions 
concerning onin«ioiis thej should be few Each pupil s score will be the sum of his 
rc^l marks— the smaller the better 
Divide the papers or answer sheets mto three piles as follows 


1 Arrange the N papers bv score, beginning on ton with the smallest number 
(best score) and going on down to the largest number (poorest score) 

2 Multiply A , the total number of testees, by 0 27 and round off the result to 
the nearest whole number, or look in Table 46 on pages 448-450 for the appro- 
priate figure, called n there 

3 Coiintf^ff then bcH papers from the top of the slach This ]s the "high ’ group 

4 Count off the n pooicst papers from the bottom of the stack Tlus is the “low 


group * , . V 

6 Put aside the middle group {'approximately 45 per cent of the papers), since 

it IS not used in the item-aiuljsis . nr tp 

D Stt up n form somewhat like Table 42, with Item Number, Wl, Wb, 
TTc - nV, and Wc + W„ headings ^ ^ 

7 is the number of persons in the low group who answered a cemin item 

wronglv, including those who omitted it It represents the U)tal number of red 
marks for that item in the low group , ,i_ , _ i. 

8 W, 1 11 the numher of cenons m the high group a-iomsnerei the item wronglv 
uicludmc those who onotted it It represents the total number ol red marks for 

that item in the high group , ♦ m ir j- IT * TVr 

9 Wi - nV means "Wj minus Ifn" for a given item ITi + Wb means ivi. 

plus IK//” for that item 

ThekreerlFr - >7//'’ the more discnminotinKporver the itnin has J“”ditmg 
Jho larcer „ tj,,. ,,enis from lent discruninatins-and therefore 

r:j:rt7eerl7ruZito^^^ 

ma, be nns-keyed, a,nh,«nons, or .nnekted n. 
content U% the rest of the test ^ j, 

eon::ie7d\uTS; IZZlatm. may be determined from lable 46 on page 44S 



TABLr 42 


The 100 Ite\w in a Fi\E-Omos MmxiPit-CtioirE Truiibit-MADr Tf'-t 
Arranged Accordino to Discrimisatisq Pop-er, from thf 
Least Discriminating to the Most Discriminating 


Item 

X umber 

Rank Order 
of Item 
According to 
Ducnminating 
Power 

(/ *= Poorexl 
Otscnminaiion) 

II/. 


Wt — Wit 
(Di«crimi- 
nation) 

^^L + TPir 

Estimated 
Percentage of 
Ezaminees 
Who Did 
Sot "Know” 
the Correct 
Answer to 
the Item* 
(Difficulty) 

30 

1 

31 

40 

-9 

71 

67 

35 

2 

1 

2 

— I 

J 

3 

27 

3 

33 

39 

— 1 

77 

73 

38 

4 

0 

0 

0 

0 

0 

31 

5 

38 

37 

1 

75 

71 

34 

6 

28 

26 

2 


51 

42 

7 

4 

1 

3 

5 

r> 

72 

8 

17 

14 

3 

31 

29 

32 

9 

5 

0 

5 

5 

5 

60 

10 

5 

0 

5 

5 

5 

29 

11 

9 

4 

5 

13 

12 

39 

12 

14 

9 

5 

23 

22 

94 

13 

6 

0 

6 

6 

6 

45 

14 

8 

1 

7 

0 

9 

28 

15 

49 

42 

7 

01 

86 

8 

16 

SS 

51 

7 

109 

103** 

33 

17 

31 

23 

8 

54 

51 

44 

18 

10 

1 

9 

11 

10 

51 

19 

10 

1 

9 

11 

10 

86 

20 

10 

1 

9 

11 

10 

92 

21 

16 

7 

9 

23 

22 


22 

13 

2 

11 

15 

14 


23 

16 

5 

11 

21 

20 


24 

24 

13 

11 

37 

35 

57 

2a 

26 

13 

11 



46 

20 

12 

0 

12 


11 


27 

13 

1 

12 




28 

13 

3 

12 


17 



15 

2 

13 





30 

17 

13 





48 

35 

13 



76 

68 

59 

19 

62 

70 

61 

43 

33 

34 

35 

36 

37 

38 

39 

40 

54 

15 

16 

17 

22 

18 

21 

28 

37 

41 

1 

2 

3 

8 

3 

6 

13 

22 

13 

14 

14 

14 

14 

13 

15 

15 

15 

95 

16 

18 

20 

30 

21 

27 

41 

59 

90 

15 

17 

19 

28 

20 

26 

39 

56 




2^.(0 - 1) - 15™ 4 (Ift + IT,) _ 0 947(Tri + Ws) 




438 



TABLC 42 (ConUnueiCi 




440 


APPLNDlCnS 


TABLE 42 {ConUnutd) 


Item 

l^umber 

Rank Order 
of Item 
According to 
Djscrt/nmannj 
Power 

(1 =» Poorest 
Dtscriimnation) 

111 

Hi/ 

Wl - Wi, 
(Di-cnmi- 
nalioti) 

H /. + HV 

/ stimaled 
Percentage of 
Examinees 

H ho Did 
Ao/ “Anow” 
the Correct 

A nswer to 
the Item 
{Dificulty) 

6 

8S 

39 

11 

23 

50 

47 

96 

89 

36 

7 

29 

43 

41 



40 

11 

29 

51 

48 

85 

91 1 

Si 

23 

29 

75 

71 

73 

92 

32 

2 

30 

31 

32 

99 

93 

43 

13 

30 

56 

53 

87 

04 

38 

7 

31 

45 

43 

93 

95 

54 

23 

31 

77 

73 

15 

96 

37 

5 

32 

42 

40 

88 

97 

37 

2 

35 

39 

37 

2 

98 

43 

8 

33 

51 

48 

12 

09 

41 

4 

37 

45 

43 

90 


49 

8 

41 

57 

54 


Then high low group data for e\erj option of each unsuitably discriminating item 
may be secured to aid m the cditmg process 


C. A Measure of Diflicully 

The huger TTt + Wb is, the harder the item nas for the group tested TTl + Ws 

may be multiplied by a constant, to obtam an estimate of the diffi- 

zn(U — I) 

cultj^ the item, corrected for chance;* here 0 is the number of options each item 
has This approximates the percentage of the testees who did not "know” the 
correct answer Items m the re\’ised test should be arranged according to Wi. + Wb, 
from lowest (easiest) to highest (hardest) 


D. An niustrati>e Analysis 

Ebon'S Wl, ITir, TTi — Wb, Wl + ITh, and difficulty values for each 
01 tne luu items on an English final examination constructed by four college m 
“do-guess” instructions to 243 college freshmen 
° ® *t®m numbers ha^ e been rearranged, the least 

one tet 

flfi m the >1 n ’ lire 66 persons in the Ion group and 

fifi - 0*^^ ‘"le ma-umum possible lalue of Wl - Wb is 

ironW Pefible value is 0 - 66 = -66 These figures 

chance about Mh nt'th except because of mis-Lejmg, honever, for even by 
chance abo ut ith of the exanunees who attempted the item n ould mark it correctly 

'"'“I; “rrcctmg item difficultv indexes for "chance ” 
crantcr oTrF l! ? ‘’TLa! Frederick B Davs 'Item Selection Techniques” 
ington D C Ame„ea„To„S?;L“a“„“" pages 267-285 tVash 









miPUFIBD ITEM-ANALYSIS 


in 

Thus the highest ^alue for IT, - IT- 

mi^mformation js fCfi - j(6C>} - 0 - « 52 fi iu Qus-Leimg, or extreme 

In pniol.ce, them inll probabN ^2 8 

at least a Uule posihre dxsmmxmhnQ powa-^nK^thr ^ ''I' Uemhave 

occur ,n Tnble 42, the hrgcnTh^, 2? *Sure. 
' olue in the table .3 41 It u,l) L „S' „ P""'''' 

30 and 90) carcfullj ,n orA,r ioM^^XZTZer fT ‘n ^ '‘“r 
HOtrer Ut us talc the least dKcnmuiating .tern ^ ^ md.5cr,mmatog 

30 hprepanngaBpeechlheS,ststcp,.tochoosea3ub)ect The speaker .hould 


A. practice 
B collect ijialcnal 
C choose gestures 
P select mam points 
n phrase a Ihcbis 


anU Io« groups (66 tetees in each) were as show-n m 
iable -13 The kejed answer was B, "collect material ' but E, "phrase a thesis 
appealed more to those students «ho earned high scores on the test as a whole 
Options A and C ( 'practice ’ and "choose gestures”) were practically useless, since 
they dccoted only 3 of the 132 persons Option D, “select mam points, ' discrimi 
nated in the proper direction S to 18 The item as a uhoJe was fairh difficult, since 
07 per cent of the freshmen did not "know” the correct answer 


TABLE 43 

NuMBEn or Examinees is Iliaij and Low Groups Chose Each Option of 
Item No 30 





Option. 



A trmier 

Group 







L 








A 

1 “ 

1 c 

D 

1 ^ 

1 Omit 

irrai7itn«» 

High 

1 

! 26 

' 0 

« 1 

31 

\ 0 

C6 

Low 

1 

35 

I 

IS 

1 

0 

66 

lotals 

2 

“ I 

1 

26 I 

I « 

0 

T32 


By using tiie abo\e information, llie speech teacher may he aWe to salvage the 
item without destrojnng its main pomt He would try to determine whv Option E 
attracted the better students This may indicate the need for additional ck^-^room 
instruction concerning steps in prepaniig a speech, or it may highlight a real cod 
flict between B and E as the correct answer for the item If the dilemma can be 
resolv'ed, new distracters mil then be devised to replace ineffective Options A and C 
On the other hand it mav not be feasible to retain the item Not all poorly 61 =;- 
cnmmating questions can be revised successfully Sometimes the point tested is 
not clear or defensible enough to «er\e as the basis for an item perefore more 
Items for each part of the test outline should be prepared than will be n^ed m the 
revised test, so that \irtuaUy unrevisable items may be discarded How many 
excess items are needed depends upon the nature of the test, purposes for which 
It will be used, and the care deroUd to initial construction and editing Some items 
such as those concerning vocabuhn, are much easier to prepare well than are 
others such as emes questions 




442 


APPhfTDICES 


^ow let us turn to the most discnminatmg item, No 90 

“HumaniU is the mould to breah avraj from, the crust to break through, the 
coal to break into fire, the atom to be split" is a quotation from 
A John Dos Passes 
B Carl Sandburg 
C Robin<^n JefTers 
D Kenneth Feanng 
E Sherwood Anderson 

Numbers of responses to the \anoub option- arc ^hown m Table 44, the kt\e(l 
answer i C, ‘ Robinson JefTcra ” Note that all four distractcrs {A, B D, E) di— 
criminate m the ntrht direction and rea^onabK well, each is more attractne to the 
low group Approximatelj 54 per cent of the tcstccs did not "know” the correct 
answer 'Dns item does not need anj editing 
How man\ of the items should be edited on the basis of ontion information like 
that contamed in Tables 43 and 44^ ProbaIil> most of them could be improved in 
this manner, especially b\ the substitution of better distracters for nonfunctioning 
ones, but the labor im olv ed in this process is too great for most teachers unless 
only a «mall portion of the items arc scnitmizcd A ruleK)f»thumb procedure would 
be to edit the 25 per cent least discnminatmg items For Tabic 42, where there are 


TABLE 44 

Ncmbeb 01 Ezauxsles is High ami Low Groups Who Cdosb Each Option or 
Item No DO 


CrTOUP ! 

1 

1 Option 

Aumirr 

of 

Examinees 

B 

« 1 

C 1 

D 

B 

1 Omif 1 

High 

Low 

■ 

B 


3 

15 

2 

11 

B 

6G 

66 

Totals 

1 

1 ^ 

1 75 

18 

13 

1 

132 


100 item, thi mvohes takmg the fir«t 25 items, who e TFl — TI » values are less 
than 12 For the-e 2o items information like that contained in Table 45 would be 
drawn up by two comcientious person': (even high «chool students) working to- 
gether Then the subject matter experts would edit carefully on the basis of the 
discmnmation difficulty and option mformation given in Tables 42 and 45 
To provade v ou matenal upon which to oractice editmg, the 25 least discnnu 
natmg iteiM arc presented herewith They still conlam spellmg and typographical 
errors w-hicli appeared on the te«t itself Of course, the<e should have l»con removed 
by careful proofing of the stencils before mimeographed copies were run off 

30 Alreadv ajjpears on page 441 
35 llie subject \ ou choose for a talk should be 
A one that is whollv new to you 

B one that interests v ou but almut which y ou know nolhmg 
t anvthing for which you can find miterial 
J> anvthing which wall find the reqmred lime 
1 mn-thatmt<!ri«t«v.,.iar,4i,»ut„hicli5ouaIre'icU knon sometliing 















SI M P7AF1BD ITEM- AS A L J 'SIS 443 

TARLK 45 

NiniBKR or Examivkes is Hum and I.ow Gnoirrs Who Chose Each Option of 
THE ‘io Least Discrimin'ativo Items 








































APPrNDicrs 

T VBLn 45 {ConUnufd) 


Jtem 
\ nmher 

Cro /p 

Option 

A umher of 
Examinees 
[Check 
Column) 

A 

li 

C 

D 1 

E 

Omtl 

33 

11 

L 

1 

0 

n 

T. 

21 

2o 

c CI 

1 

3 

0 

1 

m 

CG 

44 

m 



1 

8 

0 

0 

0 

0 

0 

66 

GG 

51 

II 

L 

0 

2 

n 

\ 

U 

M 

H 

66 

6G 

SG 

II 

1 ^ 

HI 


0 

■ 

m 

66 

CO 


1 P 

0 1 

1 

0 

1 

5 

59 1 

n 

bO 


1 ^ 

3 

0 

1 

11 

50 

B 



II 

0 



1 

61 

0 

66 


L 

0 


■1 

12 

53 

0 


14 

II 

1 

61 

1 

3 

0 

0 

66 


L 

0 

50 

2 

4 

1 

0 


74 

II 

3 

S3 


n 

2 

0 

66 


L 

6 

42 

■1 

■1 

2 

1 

GG 

67 

II 

9 

0 

4 

51 

2 

0 

66 


1 

11 

2 

5 

40 

6 

2 

G6 


27 In a panel ducu^ion the members of the panel 
A deliA er prepared speeches 
B ask que^ions of the audience 
C pro\nde informal discussion for audience 
D answers questions of the moderator 
E speak m rotation from right to left 
38 The best material for a speech 

A holds interest and de^ elops thesis 
B bores audience but de\ elops thesis 
C pleases the speaker but annojs the audience 
D pleases audience but ignores purpose of speech 
E IS unreliable but de\ elops thesis 
31 A speaker who<e purpo«e is to instruct «hould begm by ” 

A «howmg whj the information is needed 
B tellmg a funnj 6tor\ 

C «tatmg his thesis 
D puttmg a diagram on the board 
E stating his qualifications to •qieak 

34 The fundamental process under which is included methods of organizing a 
speech is 

A adjustment to the speaking situation — 
























SIMPLIFIED liEU A VALl SIS 


B articul ition 
C choici of matenal 
D b> mbolic fornml ition aiid e\'pres^.ton 
n phonation 

42 Good postun. for a «ipeaber should be 
A rigid and fitiff 
B oratontil and pompons 
C lomforlable and iLitural 
1) odd and unusual 
L lax and undi«ciulinpd 
72 In “Ament .n Letter MacUish expresses 
A lo\altj to a foreign liml 
B disgust «ith Amtiitan imlu<5try 
C di«gust nith American tradition 
D lo\alt> to AiiKnei 

32 In u mV-Ic" f n b?.^lvcr sh .uW Iry tn find one which 

c will onnoj his auJiemo 
D n ill require hllle prep iruti"" 

E lie has seen in a iionular magazine 
CO “Roan Ft illion naa nriUen b) 

A OtojW Fpeiigler 
B AMoiie Huxlei 
C Robinson Jcflcrs 
D I^onard S Uianm 

E Oeorge Boas _,i „ iCe best conlnbution who 

B does the most talking 

c listens attcntnclj but ronfiicting points ot view 

? "re pe“™.rtta“expression ol conflicting point, of view 

39 The best kind “ V'^tuTand feel good 

A make the audience gh subject 

94 Themainideaintheselectio ^ „d those who care for it 

A “show the conflict bet^enth^“__“^_l_j 

E “gXants to Pnv the be,, precaution w 

at; In order to avoid stage ingn^ i«c 

^ rewrireyo“ur'fS^^^^ “‘",‘5:““"““ 

C todi,playa tl™ ‘ overbearing attitude 



APPlXf>fCLS 


T\BLE 45 {Conitnufd) 


Item 

A umber 

1 

(jronp 

Option 

A umher of 
Examinees 
(Check 
Column) 

A 

li 

c 

D 

F 

Omit 

S3 

U 

L 

t 

0 


21 

0 

o 

1 

3 

0 

1 

GO 

66 

44 

a 

L 

67 

S6 

0 

1 

1 

S 

0 

1 

0 

0 

0 

0 

60 

GO 

51 

n 

L 

0 

2 

67 

56 

1 

0 

«> 

0 

1 

0 

0 

1 6G 

GO 

86 

11 

L ^ 

0 

1 

0 

2 

0 

0 

4 

65 
, 56 

1 1 

1 

C6 

00 

92 , 

II 

L 

0 

3 

0 

0 

1 

5 

11 

59 

50 

1 

1 

oO 

60 

81 

H 

L 

0 

0 

0 

0 

1 

1 

12 

61 

53 

0 

0 

66 

66 

14 

II 

1 

61 

1 

3 

0 

0 

60 


L 

9 

50 

2 

4 

1 

0 


74 

H 

3 

53 

2 

6 

2 

0 

CG 


L 

G 

12 

G 

9 

2 

1 


57 

H 

9 

0 

4 

51 

2 

0 

66 


L 

11 

1 = 

5 

40 

6 

2 

' 6G 


27 In a panel discussion the members of the panel 
A deU\ er prepared speeches 
B ask questions oI the audience 
C pro\'ide mfonnal di®cu«''ion for audience 
D answera questions of the moderator 
E speak m rotation from right to left 
38 The best material for a speech 

A holds interest and de% elops thesis 
B bores audience but develops thesis 
C pleases the speaker but annoj's the audience 
D pleases audience but ignores purpose of speech 
E IS unreliable but de^ elops thesis 
31 A "jpeaker ivho«e purpose is to instruct should begm by ~ 

A showmg whj the information is needed 
B telling a funnj stoT> 

C «tatmg his thesis 
D puttmg a diagram on the board 
E stating his qualifications to speak 

34 The fundamental process under which is included methods of organizing a 
speech is 

A adjustment to the speaking bituation ~ 



SIMPLIFIED Ilhll ANAL\SIE 


Uo 


42 


72 


32 


60 


B articuhlion 
C chniLi of material 
D 8> mbolir formiil ition and expression 
phonatiuii 

Good posture for a ppeaUr phould be 
A rigid and Rtid 
B oratoriL d and pompntis 
C inmfortable and n dural 
P txld and unusual 
L Lx and undisciplined 
In “Aincrn m Letter’ Macl/>ish expresses 
A lo^altj to a foreign Linl 
B disgust with Amenean industry 
C disgust with American tradition 


D lo\alty to Anunti 

i„ citTc X': '“M '7 

C will «nno> his audime 

D will require little prep iration 
E he has «ccii m a uonular magazme 
“RoanStdlion was written b> 

A Oswald Spcnglef 
B Aldous Huxley 
C Robinson Jeffers 
D Leonard S Brown 


L George Boas the best contribution who 

i" Ce“ol:;S:o’'S.oh.3pro>-U 

? SeTSlen co.mCn, pent, of ™w 

39 The best Kmd Lugrand feel good 

A make the authente u g subject 

94 Themam.deamthesel ^ the land md 

’I shoa the confl.=t a d » 'aben- 

g rr|;XUs£pav thept^^^^ . 

K Til nifipr to avoid stage y 

^ .ttitnde to the audienee 

c to d.iplay a ' don ‘ „verbearin|! attitude 

I To toidS- 



APPENDICES 


410 

28, Tlw round table method ot public discussion is suitable for groups of not more 
than: 

A. 100. 

B. 50. 

C. 25. 

D. 15. 

E. 5. 

8. One of the following is a run-on sentence. That sentence is: 

A. Dinner was sen’od, and we ate rapidly. 

B. Some of the people are waiting, others have gone ahead. 

C. This is the problem ; the solution is clear. 

D. He paused, adjusted his tic, and rang the bell; the maid refused to open 
the door, 

E. Wiistles, sirens, horns, and firecrackers broke the silence, and the Aew 
Year was born. 

33, Barnes emphasises the Four Fundamental Processes of Speech. Of these the 
first is 

A. phonation. 

B. adjustment to the speaking situation. 

C. choice of material. 

D. control of bodily activity. 

E. projection to the audience. 

44. Best advice in developing a lively sense of communication is 

A. be energetic, speak with cnllmsiasm. 

B. be passive, apathetic. 

C. appear reluctant to meet the assignment. 

D. avoid looking directly at the audience. 

E. speak with an outburst of oratorical display. 

61. Steinbeck’s The Grapes of Wrath pictures primarily: 

A. the shiftlessness of the average American form worker. 

B. the plight of the Western tenant farmers and migratory workers. 

C. the effect of Communist propaganda on poor tenant farmers. 

D. the immorality of the ignorant migratory w orkers. 

E. the lack of rehgious faith among American farmers. 

86. A boy whose fascination with a machine later turned to fear was found in 

A. Roan Stallion. 

B. Tractored Off. 

C. R.U.R. 

D. Our Changing Characteristics. 

E. Mr. Mechano. 

92. The Wright Brothers’ home w'as 

A. on Albermarle Sound. 

B. at Fort Meyers. 

C. in St. Petersburg, Florida. 

D. on the coast of North Carolina. 

E. on Hawthorne Street in Dayton, Ohio. 

81. President Truman in his Fordham Address said the one defense against the 
atom bomb lies in 

A. bigger and better fighter planes. 

B. an adequate air raid warning system. 

C. aggressive warfare. 

D. making a stronger U. N. 

E. mastering the science of human relationships. 



SniPLiriED JTEM-ANiLYSlS 


H Wild, .cnlmre b best pimctoledr 

A 1 hi' c no pend, and Ido not „„t one 

B I ha^c no pend, and 1 do not want one 

c Ihav-enopcncil and I do not nant one 
D Ihatcnopcnd-andldonotwantone 
n- '“'““P'nn'’ “"d I do not want one 
H Karel Cnpek author of “R U R”isa 
A rrcnchman 
B Crochoslo\ aknn 
C Amencnn 
D Pole 
L Italian 

57 Lippman’sftoWcmo/t/nWir/Eeelspnmanlj to 

A Bhon how ponce of mind cm be achieced 
B denounce the creeds of the leading Protestant denominations 
k-f sho^ that Chnstiamt} is a d3ing religion 
D determine the causes of modem man’s hek of religions faith 
E impro\c the morals of American \outh 


Complete option information for high and low group responses to these 25 poorJv 
di'cnminating items is pronded m Table 45 where bold-faced tj-pe indicates the 
number of persons marking the ke>ed option 
nic 20 speech items (Numbers 26-45) make up only 20 per rent of the einire 
te^t, iet 14 of these (70 per cent) appear among the 25 least discnnnnatuig items 
The Cret 23 items in the test coier «pelhng grammar and punctuation Oalj 2 of 
thc^e (8 per cent) appear in Table 45 The last 55 items deal with literature 9 of 
thc^o (16 per cent) were poor discnmmators It is obvious, then, that from the 
porcentage standpoint about 5 tunes as many apeech items as non speech ones seem 
un«atisfactorj (70 lersus 14 per cent) 

There are soeral possible evplanations for the poor showing of the speech items. 
Hrst of all, thej make up onij a fifth of the test and therefore do not have much 
weight in determining each testee’s total 'core If speech abiht) is little related to 
the other components of the test then do matter how carefullj prepared the 
•speech items are, most of them will seem to discnromate poorly 
A second, rather likc!> explanation is that good speech items are quite hard to 
construct, w hile the more standard phases of English are easier to test Some of them 
(Numbers 35 38 42 32 29 45 and 44) arc too easj and cannot discnmmate well 
Only 2 (Numbers 34 and 33) are near the optuual difficulty lei el of 50 per cent 
A thu-d possibility, related to the second, is that the speech specialist was a less 
competent item writer than the other three staff members, or that he exercised 
less care than they All four person" were etpenenced teachers but nonces 
constructing objective questions 


E. A Discnramation Table 

The process of item analjsis can be simplified by reference to Table 46 Koowmg 
the number of persons tested, one immediately reads n the proper number for the 
high or low group Then for the number of opbons each item in the test suWest 
has, find thf minnnuin i aloe of TT,. - IT. m order o conclude that the 

Item has significant discmnmatmg power In the above eiiamplj where A was 243 
n B seen to be 66, and the minimum JTc - W, for aceeptable discrimination with 



44S 


\PPEN DICES 


TABLE 4C 

Table tob DETERinviso WncnTER or Not a Given Test Ittm Discriminates 
S iGNinrASTiT lUnsiEN a “IIioh* and a "Low’* Grolp* 

{TTt » number of ptr’on* ii the Jow group who answered the item incorrecll> or 
omitted It Wg *“ number in the high group nho answered the item incorrectly or 
omitted It ) 


(U t — IPff) at or obore 11 heh an Item Can Be 
Considered Sujficiently Dtscrtminaliny 



















siMPLirinD itlm-anaTjYSIs 

TABLE 4B {Continued) 


419 




450 


appendices 

TABLE 46 iConltnued) 


Total lYumbcr 
of Persons 
TeMrd (\ ) 

A umber m 

(Ht - llff) rtf or aboie Which an Hem Can Be 
Considered Sufiaently Dtscriminaltng 

Number of Options 

I oir or Ilvjh 
(7ro»p(0 27i\) 

2 

{TniP-ral«e or 
Tnn-Option 
Multiple 
Choice) 

3 

4 

5 




13 

14 

14 




13 



343- 31G 

93 


13 




91 


14 




95 


14 

11 


354- 357 

96 


14 

14 

14 

35S- 361 

97 


14 

14 


362- 364 

93 


14 

14 

14 

36^ 36S 

99 


14 

14 

15 

360- 372 

100 


14 

14 

15 

400- 409 

110 

14 

15 

15 

15 

413- 446 

120 

14 

15 

10 

16 

4S0 -4S3 

130 

15 

16 

16 

16 

517- 520 

140 

15 

16 

17 

17 

55-1- 5o7 

150 

16 

17 

17 

IS 

591- 594 

160 

1C 

18 

18 

18 

628- 631 

170 

17 

IS 

10 

19 

C&l- 668 

ISO 

17 

19 

19 

19 

702- 703 

190 

18 

19 

20 

20 

739- 742 

200 

18 

19 

20 

20 

832- 835 

225 

19 

21 

21 

21 

925 927 

250 

20 

22 

22 

23 

1017-1020 

275 

21 

23 

23 

24 

1110-1112 

300 

22 

24 

24 

25 

14S0-14S3 

400 

25 

27 

28 

28 

1S50-1S53 

500 

28 

30 

31 

31 

3702-3705 

1000 

1 39 

43 

44 

44 


• \ aiuc3 for lhi3 21 per cent level-of significance table, which is ba«ed upon Stanlej ’s 
4 pel cent level table (see American Peycholoyist, 6 369, July, 1951), were computed 
ijj Miss Lllen V Piers 


fi^e-optlon items is 12* Purelj by accident, th^ maLes the same number of items, 

* This IS Ji = 18 per cent of the maximum possible Wt, — TTp for an n of 66 If n 
«ere 200 a difference of only 20— just 10 per cent of the maximum— would be needed 
lor a five-option item to be significantly discriminating With an n of 1,000, only 4 4 
per cent of the maxim im possible difference is required for significance Therefore, 
when n is rather small as it usuallj is for teacher made tests, a considerable number 
of Items will be branded improperly as nondiscnmmating As an extreme example, take 






SIMPLIFIED ITEM ANALYSIS -Ijl 

2o, chgiblc tor editing ns were obtained bj the 25 per cent rule of thumb 
If n calcuhtiiig machine is atailable -^1) *''5'^"“'''* 

L^+ratrtWmc i^tticafnceo^ 

points on the ditricultj ^ „„t for the middle-iliIBculti item 

thehoundan line of a icrj ‘t™ 1“ ^ The foniiiilas for obtom 

TADI r 47 

r . Il\. A- »r«) VAttJES AT TlIRfL DimCCITT I F^H^ 
IORS«UU.\S »0R FtSDINO (M 1, T ■■ 


Percentage of Testecs 


\un‘er of Options Each I Urn Has 


lUoDoNot hno\c 
the Correct Ansirer to 

2 

3 

4 

5 

10 

50 

84 

0 icon* 1 
0 500n 

0 810/1 

1 0213n 

0 667n 

1 120rt 

0 240n 

0 7o0a 

1 2C0a 

0 800a 
1344n 


reTe^'rot'dX0'.Vllfoc!i^^^ 

As shon-n m the first footnote j,„y„,sobtamedbj multipljmg 

j r,r. The last column of that taDie 

fthen 0 *= ” "n 047 / _ Rfi'i the 16 per cent difficult 

each IFi, + *„stees m each group (« 2561(661 = 17 Similarl} 

For fi^e options and 66 tertew = 0 2o6« = 256H60 

point of Tabic 42 occu^en + 8(W(06) = 53 The « ^ P 

?rrt^sTin^ra‘rd3W^ 

17 of the too Items in Table se 

whole IS relatively easy — ^ 


01 ine iva--- 

hole IS relatively easy TTTTth^w ^ 

, Table 42 oa pages 
■all d uninmalmg one’ 


452 


APPENDICES 


F. ObtamiDg the Mean and the Standard Deviation 
B\ using the sura of the Wt + Wa column of the item-analj sis table, it is rather 
eas^ to estimate the a^ erage "iiTongs” score of the N leste^ To ^ure this mean 
wrongs score not corrected for chance, simplj add the TTr, + TTa column and 
diMde this sum bj 2a 

W ZflTr + TTfl) 

2^ 

To correct the mean wrongs «core for chance, use the following formula 

w OSdTi. + ITg) 

" 2n(0 - V 

where 0 is the number of options each item nas 
The sbinilard dc\Tation of the mtmgs «cores is the same as the standard de\ lation 
of the nghts ecores when omits are counted as being wrong, and this statistic, not 
corrected for chance, is easilj e«timated b\ means of the formula 

2(iri. - 
^ 2 4511 

Note the minus «ign in this formula To «ecure a standard denation corrccte*! for 
chance, multiplj the aboie formula bj 

OSdTi. - TTn ) 

24on(0~l) 

For illustrations, turn back to Table 42, page 43S There the fTt Wa column 
sums to 4 OSS, and the TTt — ITh column total is 1702 4, OSS dinded b> 2ii 
equals or 31 0, the mean number of incorrect responses, uncorrected for 
chance, for the 100 items 

(5 X 40S8) — (2X66X41 = 387 the mean wrongs corrected for chance This 
figure agrees rather well with the mean of the ngTit-hand (difficuUj) column of 
Table 42 which is 39 0 Thus on the axerage the correct answer to the 100 items 
was ‘Tmown” by about 61 per cent of the exammees and not “known” bj about 
39 per cent 

The standard denation not corrected for chance n, 1762 — (2 45 X 6b) — 10 9, 
exactU the same as when computed from all 243 total scores Correct^ for chance 
It becomes (51(1762) - (2 45)(66)(4) = 13 6 

These approxunatmn formulas based upon onlj low and high groups are accurate 
enough for use in miM «ch».K3l situations, particularly when the number of testees 
Ls fairly large, ca> 100 or more 


G. A SmipUlied rrotmlure for Obtaining a 
Reliability Coefficient 


TJ^orlunately there ls no fairly prec^e method for «ecunng a smgle-form re 
liabinty coeffiaent (“coefficient ol eqmx alence ') without considerable computation 
Stoley has dexued two -"horter procedures yielding results closely approximating 
tbo«e of the conventional methods His simplified «^ht-haU technique has been 
reported el sewhere^ in considerable detail and will not be reiieated here Instead, 


‘Julian C Stanley 
Coefficient of a Te«t 


A Simplified Method for Estimating the Split Half Reliability 
Harcard Edueahon^A Revtrw, 21 221 224 Fall 19ol 



SIMPLIFIED ITEM-ANALYSIS 


a Kuder-HicIj-irtlcon l-ormula 20 (KR^) 
high group hgurc!, m Table 42, page43S 


r Kill be obtained from just the loK and 
K being the number of items 


KR:,^ •f.T F/i) - ^{Wl + 

i-l] OmiSOVi-Wir)}^ j 


loor 
99 1 
100 / 
99 V 


1 _ 2(CG)( 40S8) - 227 630 ] 
0 667(1762)* J 


311 9S6 \ 
2070 793/ 


100 

99 


(0 849) 


* .S6, 

nliirh K the same as the ^nlue <ecured bj using the regular KR» formula mth all 
243 ca«cs In general the abbreviated procedure jields slightly lower r « but for 
most practical purposes this negative bias will be negligible 
The onli part of the above formula not used in computing either the mean or the 
stand ird deviation is i-(H i + #)* To get it squire each of the 100 (IT/ )l *.) 

values in Table 42 and then sum them (71)* + (3)* + (77)» *j- + (57)* •= 

227,630 Bj hand this is a labonous process indeed but on an electric calculating 
ranchino (Fnden, Marchant Monroe) it is simple to secure both l(Wi + Wh) and 
-(FTt + II'k)* m a single set of operations In most medium sued and laige school 
stems there is a machine of this sort aod someone who knows how to use it 
Getting all ihref needed values for the formula and cheebng them should take a 
sX/lJed op^sntor not mon? thun haU an boar, il as item aualjais table similar to 
Table 45 Ins alreadj been prepared 

This method docs not mv olv e splitting the test into halves a tedious undertaking 
at best when a tcst-scoring machine is not available 9he split»half coefficient of 
equivalence ba«ed upon all 243 lestecs is 87, it is also 87 when determined from 
oiiJ> the high and low groups KR» coefiicients tend to be a little smaller than 
split-lnlf ones,* but again the discrepancy is usually of no practical consequence 
to the teacher 

~n^^Croab,ch Copffinpiil Alpha .nd lie totenal Stricture of Tct, P«eS»- 

rnelnka 16 297-334 September 1951 



APPENDIX 


c 


Scoring Rearrangement (^Ranking) 
Test Items 


In many subjects, especmll} historj, where one of the object»\es is to acquire *1 
«ense of sequence or chronology, rearrangcmeDt questions may be a better testing 
device than other item forms Their construction calls for considerable skill in 
putting together material so that the ranking task will not be too demanding at 
some pomts and too easy at others Comparatively little has bwn \\ ntten about 
the preparation of this tyme of item, though much has been said about scoring it, 
u m all Ukeuhood there are potential uses for rearrangement items m many 
aca emic nelds Steps m solnng a mathematical problem, sequences of equations 
®^'stry, and relatue quabty of \anous hterary selections are possible illustra 
^ condition is that “experts” can agree reasonably well as to how 

mgs 8 ould be ranked, since otherwise it will not be possible to devuse a 
independent keying by at least two teachers and 
desirable, but such pre- 
cautions should by no means be confined to rearrangement items 

pxfim^inpoo*L°^ presented m Table 19 on page 95 and scored for two 

A littlp roflo the Spearman rank-difference coefficient of correlation, rho 

betwEpn '?» make it obvious that the magnitude of each discrepancy 

ffiSr Ts J^st the fact that they 

think that tho Frpn historian s point of \new, it is far worse to 

place it ,u2 nlo before the Roman Empire fell than to 

wTonl^^to r^fp destruction of the Spanish Armada Similarly, it is 

cork and balsa cavier than white oak than to confuse the densities of 

h-fitera scoring rearrangement items has discouraged most 

n^formThk One -V t ^ ^^ble is available, the task i, 

students the rho (p) between the 

this P"ge 95 and to multiply 

certam ^ the testee whose rankings on a 

0X6 = 0 pomts credit fnr iw V bis rho on another such item is 0, he gets 
P credit for that item By being completely misinformed and 

454 



ffJ/VAYA'O' TVST ITrvs 

4oo 

>«imng a rho of -1, he could earn f-l) V r. - ‘ 

Item ts just about the onh t\T)e that fcilivs fn,« r ^'ea-AngeiDent 

But usmg the fomuh^wre 1 m(o account 

>alucs of p are treated acTTl,. “t ‘tags ranied) X rho, the aanons 

ssr.=ste5£S-=a3s 


TABLE 48 


20* TaBLI* roR ScOfUSQ TlE.AIUtlSBEJIEVT ItE5£S 
ScoRF » (NclfBCR OF TiffNGs RavkedJ X fRno)» 
(Ix)Ok up 2Z>*in Aporopnate Column) 



To score a rearrangement item «implj obtain the SO* ‘ m jour head” by com 
paring the student s reponses mth a coda enientlj arranged ke> This SZ)* is used 
to enter the column of the table for the proper number of things ranked from which 
the score is read directly at either the left or the nght All SZ?* \alues for N = 2 
through A'’ » 7 are contained m Table 48 Tables for N’s higher than 7 can be 
prepared rather easilj, but it does not seem wise to construct these and thereby 
encourage teacbere to dense more complex and less homogeneous rearrangement 
items than are usuallj desirable rr. u, . 

Turn back to Table 19 on page 95 hr two illustrations of how Table 48 is used 
There Richard Roe’s SZT is 40 for 6 c\ents ranked I/iokmg m the 6 column of 
Table 48 we find that 40 lies withm the 26-44 20" range for wtah he eenre h 0 
Richard gets no points at all for his inaccurate noTo^es to te item Had we 
employed the formula S - Np, he would haie receiied 6(- H) - - SI - -1 

'*™hn Doe, the other testee for whom ranis ire listed in Table 19, Jad n "f « 
which hes within the 6-8 interval of Trtle48 “d m ^ 

nS^tsVe wlrmg m'e Sof lally joelds scores closer to O than 

the Np procedure It seems somewhat more defensible on statistical grounds 







APPENDIX 


D 


The Computation of Square Roots 


When detennming a standard denation or an r, one must find the squAre root 
of a number This can be done easil> from a table of square roots, such as Barlow's 
Tables of Sqitares, Cubes, Square Roots, Cube Roots and Reciprocals (London 
E A, F ht Spon, Ltd ) Quite a few statistics texts and other boohs have shorter 
tables If only an occasional square root is needed, it can probabl> be secured more 
easil} ‘ b) hand ’ than bj searching for a tabic 
In Chapter 15 of Mathematics Essential for Elementary Statistics (New York 
Henry Holt and Companj , 1951), Helen M Walker dev otes 16 pages to explaining 
what square roots mean and how thev are secured The reader who is thoroughly 
m the dark concerning this topic will want to consult that reference For others 
who need merely a httle reviewing of material previousi} learned, the followmg 
bnef explanation is offered 

1. First take a small three-digit whole number that is a perfect square, 144 is a 
good illustration The square root of 144 is a number which, when multiplied by 
itself, equals 144 To extract the square root of 144 follow these «even steps 


w 

w 


V144 


fc) 1 . 

w 

1 , 



V 1^44 

-1 


-1 


0 0 44 


456 



COMPUTATION OP SQUARL ROOTS 

(e) 1 


V 1^44 
-1 

II 0 4( 


ff) 1 2 

Ia« 

-1 

2^' 0 44 


457 


(?) I 2 

vTTTT 

-I 

211 0 44 
- 44 


(а) ntc t!ic 144 with a square root ^ign and two decimal points one above the 
other 

(б) Begin with the decimal point following 144 and mo\e to the left two digits at 
a time, putting a circt at each stopping pLice M ith 144 only one ma\e and 
therefore one caret is needed 

(c) Look nt the number to the left of the caret IVhat number multiplied by itself 
IS as nearly equal to I as possible but not greater than 1? 1, of course so write 
this 2 aboio the 1 and below it Subtract 
(rf) Draw down the next two numbers 
(c> Double the top 1 and wTite it to the left of 0 44 

if) Now, how many time8docs2gointo04?2 so write 2 in the answer space above 
the nghWnnd 4 and al«o to the nght of the 2 

ig) Multiply the 22 by 2 write this product (44) below the other 44 and subtract 
rherefore, the square root of 144 is exactly 12 since 12 X 12 =» 144 

2. Take a large decimal fraction 9342 156, and find its square root to the nearest 
Ino decimal places 


(«) 

V03^42 16^60^00^ 


(6) 9 

V 93,^42 15,^60/^00/^ 
-81 

12 


V 93/,42 
-81 


(d) 9 6 

V 93^42 
-81 

186 I 12 42 
^-11 16 

1 26 


18 ( 12 42 



appendices 


•i')8 

te) 



9 

6 6 

if) 


V 


-81 



186 1 

12 

42 

186 1 

-11 

16 


1926 1 

1 

-1 

26 15 

15 56 




10 59 

19323 1 




193304 t 


9 6 6 5 4 


*= 96 6: 


1 2615 
-1 1556 


10 59 60 
- 9 00 25 


93 35 00 
- 77 32 10 


(fl\ First n-ntr down the number with the square root sign, carets, and a decimal 
point m the answer place (Notice that ihis decimal point is alwajs exactl> 
aboNC the decimal point m the number) Begin at the decimal point in the 
number and count in both directions b> twos putting a caret between each 
pair Zeros are added to the right ol the decimal point bejond the last figure 
m order to have the two numbers to draw <lown each time In order to carrj 
out the square root to the nearest two decimal places (rounded off from three 
places) It IS necessary to have six figures to the right of the decimal point 
(6) Int number multiplied b\ itself is as ncarh equal to 93 as possible, wnthout 
exceeding it^ 10 X 10 » 100 which is too muen 9 X 9 81, so use 9 as the 

first number m the square root MuUipl> it by itself and «*ubtract the 81 from 93 

(c) Draw down the next two numbers (42), double 9, and write 18 to the left of 
12 42 

(d) Approximately how man> times will IS go into 124? Not quite 7, for 18 X 7 
«= 126 Tr> the next lower number, 6 IVntc it in the answer space aboie the 
2 and aI«o to the right ot the 18 Multiply 6 by 186 and subtract this product, 
11 16 from 12 42 

(e) Draw down the next two numbeis and double the 96 19 goe® into 126 about 

6 times Repeat the abo\e process 

if) Double 966 write 1932 in the proper place, and complete the remammg steps 
The square root of 9342 156 is 96 654 +, which when rounded off to the nearest 
two decimal places becomes 96 65 WTiere test scores are concerned, only one 
decimal place is usualh needed for the standard denation Also, in the denominator 
of the r formula on page 92 extraction of the square roots to the nearest three 
/igures is sufficient for most purposes 

To check the computation of a square root, multiply the value obtained by itself 
add to this product the remainder For example (96 654 X 96 654)+ 160284 
— 9342 156000 which agrees exactly wiUi the figure underneath the square root 
sign m Step (o) on page 457 



APPENDIX 


E 


Ansivers to Questions in Appendix A 


c. 3,3 3H, 310 317 318 

D. For ex'xmolo 310-310 likewise Ihc diffpience between the 

nnd 310-a total of U - (3W + t) 51 - C310 - 0 5, - 

upper aotl louer “real eb<s Iroits 
319 5 - 309 S » 10 

A. Pec obua e _ ^ j, 259 5 + (10/2) - 200 5 - 

1 B. (2G0 + 2C9)/2 “ 520/2 

aO/2) « 20-1 5 A f •>,- cb^. 260-260 whose d i> 0 

i u. “ 

2„'‘U3 = 2G15 


c. 


50/2 = 2j 259 5 + 


(25_-2J)„oi 


,239, + (10/12) = 200 3 Sim, 


,.„uount, top aown-eheeV, 2095. 
209 6 - (110/12) = 


( 2 irl ^)(.01 = 


D. The mode - 
largest figure 
= 264 5 

D. i of 50 19 




239 6 + (75/8) = 3 _ ,« . 249 5 - 0 0 = 218 9 

. The75tl.perc™t‘l'“‘° j„,50e 125 Coum 
i -=' 7gr“8N ‘7- 2,95 - (45/6' = 2795 - 75 - 2,20 
2795 -(-T - 209 5 + 

37 5 frequencira 269 5 + b / 

check, eount up 37 5 10 9 

(15/61 - 269 6 + 25 


459 



460 


APPENDICES 


10 B. Qt IS the 75th percentile, found to be 272 0 m Question 9, abo\c, and Qi 

IS the 2oth percentile, 248 9 m Question 8 (272 0 — 248 91/2 “ (23 I)/2 
= 11 55 = 12 

11 C. iVofSOisS 239 5 + (101 = 239 5 Counting down, tV of 50 is 45 

239 5 - (10> = 239 5 

12 C. 50 — 0 9(60) = 5 Count down 5 or count up 45 ObMOUslj, it is easier 

to count down 5 Counting up is useful as a check, though 299 5 — 

(101 = 299 5 - (20/4) = 294 5 Thus the 90th percentile is the 

midpoint of the 290-299 class it lies cxactl} halfwaj within that class, 
5 umts below the upper real limit and 5 units abo\ e the lower real limit 

Check bj counting up 289 5 + ~ (10) = 289 5 + 5 = 291 5 

13 B U«e the answers to Questions 12 and 11 m soiling this 0 4(294 5 — 239 5) 

» 0 4(55) = 22 0 

14 F 10 X „ Vl 1,950 - 22a ^ 

50 5 5 


5 


-210 


22 


15 n. 10 5 - 95 e 10 

16 C. 99 5 + 5(109 5 - 99 51 « 09 5 + 1(101 - liM 5 To chock 109 5 - 

i(l09 5 - 99 5' = 109 5 “ 5 = 104 5 


17 A. There are tno essentialU different rea«ons for grouping «cores computa- 

tional (to reduce labor) and graphical (to emphasize important features of 
the data) One of the best discussions of the Latter aspect is contained in 
Truman L Kellej’s Fundamentalt of Statistics, Chapter IV, "Graphic 
Methods " Cambridge, Massachusetts Uarcard Uniacrsitj Press, 1947 

18 A It Ls nece««ar 3 to know the range before performing the operations set forth 

in Options B, C, D, and E 

19 E. For mstance, mto which of the two clashes, 44-48 and 40-^4, would iou 

put a score of 44? 

percentile is deSned as the iiomt m a distribution below which he 
per cent of the scores and abo\e which he 40 per cent of the scores 
21 D. When arranged m numcncal order, these scores are 4 4 5 6, 7 The 
middle score (midscore) is 5 


22 


In numencal order the«e scores are 44. 46, 46, 46, 63, 68, 68, them mid 
score is 46, which has three numbere on each side of it A frequency distn 

bution of these scores mth an inteiaal of 1 IS as follow- 


Score f 

68 2 

63 1 

46 3 

44 1 


T)ie mrfim ol tlus distnbuhon is found bj counting half the waj up or 
column The total freqiencj is 7, and Imlf rf 7 Js 5 
Counting up througli the 43 5^ 5 class uses only 1 frequency, but count- 



ANSWERS TO QUESTIONS 


461 


mg through the nert (45 5-46 5) class mvolves 1 -f 3 = 4 frequencies 
more tlian the 3 5 reqmral to locate the median Thus the median is 

■*5 5 + (1) = 455 + (25)/3 - 4S 5 083 = 46 33 Tocheci 

/S 5 — 3\ 

40 5 - ~ Wn => 465 - (05/3) = 465 - 017 = 4653 Thejr- 

forc, the discrepancy between the median and tlie midscore m this di-stn- 
bution 13 4f) 33 — 46 » 0 33, which illustrates the fact that the imdscore 
and the rni^li m of a distribution may have different values Usually the 
difference is slight, however 

23 C. (4 + 0 + 7 + 6 + 41/5 = 26/5 * 52 

24 C. Onl^ tnoofthefiieoptions(Cand^inntainraeasuresofcentraUendenci 

the stnndird dcintion, qiiirliJe deviation and range are measures of van 
abilit) Since the anthmetic mean i^ a function of ei en score m the di'tri 
bution, its nine nould reflect “the undue influence of a few extreme 
vaKnes " hethcr the highest paid norkermadeSoDOOorSSOOOOisnholly 
incoii'cquontial so far as the «ize of the median i> concerned 


2» E. 
26 U. 


27, C, M 


M + 


>215W 

iV 


^ , r4o-(-0 5)lX U7X0)-K3X1)1 

tu T oJ * T 7 + 3 


-13+"-^ = 15-l-)2=27 

2S A. It shon-s ths range of the middle 80 iwr cent of the "cores in the disfn 
button 

2D A. Qi js the 75th percentile, and the median is the 50th percentile '^enU five 


per cent of the scores he betivcen these tno points, "ince 75 - 50 ■= 25 


Score 

4 

3 

2 

1 

0 


2 

I 

0 

-I 


/d 

6 

0 

0 

0 


n 

0 

0 

0 

12 


2fd » 0 s/d' » 24 


t X VXSfiP - ( W 


I X Vld(24) - 0» 


;/240 ^ ^ « I 55 
10 10 
1 _5_J 
\^40 00 

1 

25 j 1 40 
— 1 25 
305 < 1500 

“ — 1525 

-25 


10 



appendices 


4C0 

= 11 55 - 12 

11 C A of 60 IS 5 239 5 + 
/45 - 45 


-{H^) 


(•-=-•)(■»■ 

(10^ = 230 5 


= 230 5 Counting don n, 1*5 of 50 is 45 


C 50 - 0 9(50) = 5 Count down 5 or count up 45 Ob\iouslj, it ^ easier 
to count do™ 5 Counting up .3 useful ns n check, though 299 5 - 

(10) = 299 5 - (20/4) = 291 5 Thus the 90lh percentile is the 

midpoint of the 290-299 class it lies eenctlj haUmy wi^m that ctes 
5 units below the upper real limit and 5 units above the lower real limit 
/45 — 43 


Check bj counting up 289 5 + 




( 10 ) 


289 5 + 5 “ 294 5 

239 5) 


B U«e the ansnert to Questions 12 and 11 m ‘sobing tins 0 4(294 5 

- 04(55) = 220 ^ 

V ll,72 5 
5 


14 F 10 X 


V 5a(239) - (~15)» ^ Vll,950 - 225 ^ ' 


50 


15 B 106 - 95 = 10 , nn- 

16 C 995 + 1(1095 - 99 5) = 995 + i(10i - KM 5 To check 109 o - 

i(l09 5 995' - 1095 - 5 = 104 5 

17 A There are two es.entiallj different reasons for grouping scores compute 

tioual (to reduce labor) and graphical (to emphasize important features of 
the data) One of the best discussions of the Latter aspect is contmned in 
Truman L Kellej's Fundamentah of Sfafisfics, Chapter IV, ‘Graphic 
Methods’ Cambridge, Massachusetts IIar\ard Uni\crsity Press 194( 

18 A It is nece^sarj to knov. the range before performing the operations set forth 

in Options B, C, D, and E 

19 E For mstance, into which of the two cLassea, 44-48 and 40-44, would jou 

put a score of 447 

20 D The 60th percentile is defined as tlie jioint m a distnbution below which he 

60 per cent of the scores and abo\e nhich he 40 per cent of the scores 

21 D When arranged in numerical order, these scores are 4 4 5 6, 7 Tlie 

middle score (midscore) is 5 

22 C In numerical order these scores are 44 46, 46 46, 63, 68, 68, their mid 

score is 46 which has three numbers on each side of it A frequency distn 
bution of these scores with an interval of 1 is as follow® 


Score f 

68 2 

63 1 

46 3 

44 1 


The median ol this distnbution is found by counting half the waj up or 
down the frequenej column The total frequencj is 7, and half of 7 is 3 5 
Countmg up through the 43 5-44 5 class uses only 1 frequency but countr 



ANmERS TO QUESTIONS 


461 


ing through the next (45 5-46 5) class involves I -f- 3 = 4 frequenojes 
mote thin the 35 required to locate the median Thus the median is 

45 5 + ( (11 = 45 5 + (2 5)/3 = 45 5 + 0 83 = 46 33 To check 

40 5 - ( ) (n - 46 5 - (0 5/3) = 46 5 - 0 17 = 4B 33 There- 

fore, the discrepancy between the median and the midscore in this distn 
bution is 46 33 — 46 » 0 33, which illustrates the fact that the midscore 
and the median of a distnbution may have different values Usually the 
difference la ‘flight, however 

23 C, (4 -H 5 -f 7 -f- 6 4- 41/5 = 26/5 = 52 

24 C OnI\ two of the fit e options {C and E) tontam measures of central tendenct 

the standard donation qu irtile de\ lation and range are measures of van 
ability Since the arithmetic mean n a function of everi score m the dntn 
hution, its \aluc would reflect “the undue influence 0/ a few extreme 
pa! incs " hether tlie highest paid worker made ?o000 or S50 000 is wholly 
inconrequentul so far a« the site of the medun is concerned 

2-i F. 




. (0 + 3 , ,2 + |Lx2l±Ji^ 


iU 

It «hon-« the range of the middle SO per cent of tho "Corea in the distn 
bution 

. Qi IS the 75th percentile, and the median is the 50th percentile Twent\ five 
per cent of the scores he between these two points, since 76 - 50 » 25 


1 X 


1^=165 

10 10 

1 5 5 

V274OOO 

1 

25 , 1 40 
— 1 25 
305 ^ 1500 

' ^525 

-25 



402 


APPhtVDICES 


Ihe sue of tlie mtenol of each class is 1, the classes 2 5-3 o and 0 a- 
haiing frequencies of 0 For conicnieiice the assumed mean il/', nas ' 
at 2, the midpoint of the 1 5-2 5 class, a hicli is the center of the distnbutK 
It maj be put anjahcre else, of course, aithout altenng the ansaer 
_ 75th percentile - 25th percentile 



f55 - 025) - (1 5 + 067) ^ 525 - 2 17 _ 3 03 _ ^ 
2 2 2 


32 C 25 + 
Cbcck 


(1) = 25 + 

35-(V)0.= 


0 75 * 3 25 


3 5 - 0 25 = 3 25 


Score — Meat! 70 — SO 
^ ” SD ~ 20 


-10 


-05 


Therefore, tins indiMduil lulf a standard dcMation below the mean 
the group with which he was lc^tcd 

34 E. Three tied «!core3 of 95 occur K there were no tie« lhc«c three places would^ 

ha^e ranks of 4, 5, and 6 Since one «core of 95 is as good as another, we 
assign the a\erage of 4, 5 and 6— which is 5 — to each of the three scores 
J^ote that 4 + 5 + 6 « 15, the same as the sum of the new nnks 
5 + 5 + 5 = 15 ^\’hether or not ties occur, the sum of a certain number 
OV) of consecuti\e ranks bcgmning with 1 will alwajs be [N{N + l)]/2 
If there are 9 ranks as m this question, their sum will be (9 X 10)/2 45 

35 B As noted aboie with reference to Question 34, the method of assigning to 

tied scores the average rank that would have occurred without ties keeps 
the sum of untied and tied gets of ranks of the same length identical Smcc 
the sum is unchanged, the mean w hich is the sum div ided bv A^, the number 

4- 1 

of ranks, is also unchanged It will alwajs be — ~ — 

36 B See Chapter 3 

37 A 

38 C An r of 0 ha- the least possible predictive value The closer to 0 r get® 

regardle8$ of sign the poorer prediction becomes 

39 E The r discussed in Chapter 3 simplv cannot be greiter than +1 00 or 

— 1 00 except when computational errors are made 

40 A Each tallj mark represents a pair of scores There are Y pairs of scores in 

all The number of cells in a 12 X 12 scatter diagram is 144 but some of 
these will probablj be blank, while others will have more than one tallv 
Sw Table 18 page 92, which has 15 X 14 = 210 cells 33 of which contain 
the Y = 43 tallies 210 — 33 = 177 of the cell- are emptj 

41 B The arithmetic mean is a measure of cential tendency, the standard dev i 

ation is a measure of variabihtv 

42 C 0ji= the 75th percentile, the median (Q) IS the 50th percentile 



463 


ANSWERS TO QUESTIONS 


43 h. Both the mean and the standard demttoo are based upon all scores m the 
distribution, the median and Oare both percentile measures Also, the mean 
13 used with the SD, ^hile the median is u«ed irith Q The analogy is A 
certain kind of measure of central tendeni^ is to a similar sort of measure 
of \ anabiht> as another kind of measure of central tendency is to a su r i il s r 
sort of measure of \ anabihtj 

44. E. A frequency distribution has a median but usually no midscore while un 
grouped measures ha\e a midscore (the middle score if the number of 
scores is odd, or the a\ erage of the two middle scores if the number is e\ en) 
45 U. 50 per cent of all the measures in a distribution alwa> s he between Qi and Qi 
In a normal («o-called "bell-shaped”) distribution, 68 per cent of all cases 
he withm one standard deviation of the mean 
40 A. The arithmetic mean is the most reliable measure of central teudency, the 
mo<le the least reliable, the standard denation is the most reliable measure 
of xanabilitj , the range the least reliable 
47 It. Tlie standard dcvnation is a Imcar distance along the base hue of a frequency 
distribution 

4S E. ^^'hcn correlation is posituc, high scores on one test tend to go wuth high 
scores on the other test, while low scores tend to go with low scores This is 
a direct rclation'hip )\’hon correlation is negative, high scores on one test 
go with low scores on the other, and \ice versa This is an inverse rela 
tion«hip 

49 A. Spearman derived the formula for the rank-diUerence coefficient of cor 
relation, rho, while somewhat earlier Pearson bad derived the basic formula 




APPENDIX 


F 

Publishers of Standardised Tests 


Th® follow uig liat lucludes O' crj lest corapan\ for w horn fi\ o or more tests are 
irdexed on pages nOO-llOO of 0«car K Bufos (Fditor), The Fourth Menial 
VeQ«urcmenl8 \ earbool Highland Park, Jer«c> Grj'phon Press, 1953 
The number of test* coiercd m the 1953 jearbook is shown in Jiold-foce t3r*pe 
follovnng each address An asten«k (*) preceding the name indicates^ that tlie 
compaai i-aues <Atalosu,e& devoted enticelj or in large part to tests'* (Buros, 
page 1100) 

*Aeom Pubhgking Co , Ine , RockvTllc Centre, iven York (21) 

^Australian Council for Educational Researdi, 147 Collins St , Melbourne, C 1, 
Australia (13) 

Benton Renew Publishing Co , Inc , Fowler, Indiana (11) 

*Bureau of Educational Measurements, K.ins?><i State Teachers College of Empona, 
Etapona, Kansas (19) 

*Bureau of Ediuxitional Research and Service, State University of Joira, Iowa Citj , 
Iowa (14) 

*Bureau of Publicalions, Teachers College, Columbia Umiersi'y, Kew Y'ork 27, 
^ew lork (11) 

*Cnlifcimia Test Bureau, 5916 Hollywood Bl\d , Los Angeles 28, California (29) 
*Cenierfor Psychological Service, George Washmgton Univ ersitj , Washington, D C 
(3) 

College Entrance Examination Bawd, 425 W. lJ7th Street, New York 27, New 
York (18) 

*Co<3peraiiie Test Dunsion, Fducolionol Testing Service, Pnneeton, New Jersej (59) 
Division of Educational Reference, Purdue Universnly, Lafajette, Indiana (6) 
Educational Records Bur*>au, 21 Audubon Ave , New York 32, New York (lO) 
*Eaucalional Test Bureau, Educational Publishers, Inc , 720 Wasbington A\e , S E , 
Minneapolis, Minnesota (30) 

Educalicnal Testing Snvice, Pnneeton, New Jersey (19) 

464 


4G5 


PUBLISHERS OF STANDARDIZED TESTS 

*C A Gregory Co , 345 Calhoun St , Oncinnati 19, Oho (5) 

‘George G Ilarrap A Co , Ltd , 182 High Holborn, London W C I, England (11) 
‘IlGvghhn Co , 2 Tad St , Boston T, Massachusetts (14) 

Joint Committee on Tests, 132 W Chelton A\c , Phladelpha 44, Pennsjlvania (7) 
Naliorwl League of Nursing Educalton, Inc, 2 Park A\e, Aew York 16, ?,ew 
York (5) 

•O/iioSdohrs/n;) Tests, Oho State Department of Education, Columbus, Oho (29) 
Personnel Research Institute, Western Resene Unuersity CIe^ eland Oho (9) 
‘Psychologxcal Corporation, 522 Pifth X\e b,ew York 18, ^e^\ York (51) 
Psgdiological Sennee Center Press, 1275 ^ew Hampshire A\ e , N W , Washmgton 
G, D C (6) 

PsycJiomelne xiffdiatcs Boa 1G25, Chicago 90 Illinois (5) 

‘Public School Publishing Company 509-513 North East St , Bloomington, Illinois 
(12) 

‘Science Research Associates, Inc, 57 West Grand A^e , Chicago 10, Ilhnois (37) 
‘Shendan Supply Co , P 0 Boy 837, Beieri^ Hills, California (9) 

Turner E Smith A Co , 441 W cst Peachtree St , N E , Atlanta 3, Georgia (8) 
‘Stanford Unuersity Press, Stanford, California (13) 

State High School Testing Senice for Indiana, Purdue Uni%ersit>, Lafajettc, 
Indiana (33) 

Sleek Co , Austin 1, Texas (6) 

*C H Stocliing Co , 424 North Homan Aie , Chicago 24, Illinois (3) 

‘Uniicrnty of London Press, Ltd, Little Pauls House Warwick Square, London 
E C 4, England (12) 

‘Vocational Guidance Centre, 371 BJoorSt.W ,ToroDto5 Canada (12) 

•irorW Book Company, 313 Park Hill Ave, 'ionkers-on Hudson 5, New York (4i) 



Author Index 


Adams, Eunice, 294 
i^ED?ro'h‘i-cf55,I02.19..245 

'^lror.!’S;5™'4’.“t7C,100,2.4.«. 
Anusta'i, Anne, 219_. 348 
Anderson, C. J , 303 

Anderson. Gordon y . 371 

Anderson, Harold A., Uo. 2^ 

Anderson, Scarvia B., 152, 32b 
ArbucUe, Dugald ^ J71 
Aribtollc, 8, 0, 20 

Arkin, Hubert, -73 
Arnold, Dwight L, 143 
Arthur, Grace, 422 
Ashbaugh, E. J . 4l, 123 
Ashburn. Robert R., 193 
Asher, 277 

Avent, Joseph E , 13 J .. 

AjTCs, Leonard i’., 3S, 3J, 

Bain, Alexander, 62 
Baker, ^krtliur 0., 168 

Bale^ VatlU 254 

Bamberger. Sutcr Clara ri» 

Barrett, E. R-. 182 
Barton, W. A* Jr, 17 
Bass, Bernard M . 421 

Bajless, Ernest E, 174 
Beard, Charles A , 4 
Beauchamp. 'V ^ » 

Bebcll, Clifford, 366 

lsWc;,\?4 
Bp^airau, A. Cop"''™ ' 245 . 371, 

B.ou'tl, George K, 134, 240. 

Bemon, Artlmr L. 398 

Berdio, IUI11I1 1 - 
Bcpksh.re, 3“"“! iVrl 345 
Bolta, Kmio'tt AlO'Pl, 

Bells, Gilbert I, .183 354 358 362 

BilleU. Hoy O . W 

g7fl29';Vo%s“2?284,290,35..4.8 


Bingham, Walter V., 3o0 
Bixler, Harold H , ISO 
Black, Max, 14 

Blair, Glenn Mjers, 331, 334, 345 
Bledsoe, Joseph C , 371 
Blm, 32 

Blommers, Paul, 22 
Bobbitt, Joseph M , 341 
Book, Wdbam F , 321 
Bordin, Edward, 144 
Dorgersrpde. Fred von, 221-224 
Bonne. Edwin 0 , 6, 15, 2o, 33 
Boss, Mabel E , 406 
Bowles, Frank H., 194 
Bojd, Gertrude, 34o 
BojCP, Ph;”'? A - VS'im’m 
Boynton, Paul L 
Brandenburg, G. C , 49 
Brenner, Benjamin, 32o, 33b 

Badges, Claude r., 246 

Bnnion, W. G . 247. 271, -73 
Brown, Cbra M , 183 
Brown, Edwin J , 402 

BrowD, r W , 187 
Brown Francis J , 31o, 3ir 
Brown, G L , 284 

Brown, JudsonS , 327 

Brown TVilham, 123. 124 
Brown, Woodrow A .24b 

Browne, Arthf D , 3»s 142 

Brownell, Wdbam A.. H, ' 

167. 224 329, 341 

Bnibacher. John s., 15, 20 

Brueckner, I^o J , 126. 33b 

Burns, Bob, 150 

Burnside, Carobn J » j’o, 121, 134, 19 
Buros, Oscar, 54, 55. 

219, 245. 464 

Burt,C>"*-®V ,40 

Buswell, Guy T., 34U 
CadMll. Dotothj H B , 245 

467 


408 


A UTIIOR JNDUX 


CaUin, Allen, 51 

Cameron Dale C , 341 

Camp, 309 

Campbell Pera, 340 

Cane, V R , 327 

Cantnl Hadlej,52 

Carmichael, Leonard, 47 

CanoU, John B , 135, 219 

Carter, Ralph E , 19S, 200 

Cattell, J McKecn, 30, 33 52, 115 

Cattell, Ps^che, 2S4. 2S5, 28S 

CatteU, R B , 52 

Chadwick, E , 38 

Chalmers, T M , 314 

Chambers, E G , 104 

Charters W W , 337 

Cha<=e, W P . 326 

Chaimcy , Henn, 300, 371 

Chave, Ernest J , 59 

Childers Leon hi , 41 

Clark, Champ 19 

Clark, Edward L , 124, 348 

Clark, mills W , 175 

Cobb, E B , 245, 371 

Cobb, Irvin S , 19 

Cochran R05 E , 194, 196, 203, 204 
Cohen, I Bernard, 25 
Cole, Robert D , 221-224 
Coleman William, 245, 371 
Colton, Raymond R , 273 
Compton, R K 321 
Conant, James Br\aDt, S 
ConidiD, Edmund S , 352 
Conneau, A % 163 
Conner, 310 

Conrad Herbert S 51, 52, 120, 297 
Cook, Stuart W , 25, 51, 39S 425 
Cook, Walter W 59, 166, 200, 300, 327, 
345. 365 

Coombs Cljde H , 424 

Cooper, D H , 398 

ComeU, Ethel L , 359 

Cornell, Francis G ,'394 395, 396, 398 

Comog, J , 168 

Courtis, Stuart A , 29, 38, 134, 219 
Cowdery, Karl M , 51 
Crawford, Albert Beecher, 371 
Crawlord, John R , 291 
Crespi, Leo P , 52 
Cres<!ey, Paul F , 27 

Cronbach, Lee J , 23, 55, 124, 154, 162 
180, 245, 300, 345, 416, 423, 453 
Crow, Alice, 365 
Crow, Lester D , 365 
Crowder, Korman A , 418 
Cureton, Edward E , 135, 298, 417, 420, 

Curtis, F D , 174 


Dailej John T , 398 
Daly, Joseph F , 102 
Damebon, Paul J , 372 
Darle> , John G 278, 371, 415 
Darlmg, C 174 
Darwin, Charles, 8 


Davenport, K S , 110 
Davis, Allison. ^8, 422 
Davis, Frederick B , 22, 25, 55, 120, 134, 
135, 154, 157, 100, 1G2, 371, 422, 440 
Davia, Georgia, 341, 342 
Davis, R , 17G 

Davis Robert A , 25, 139, 164, 107, 348, 
421 

Doming, W Edwards, GO 
r)eput>,E C 317,319 
Dcutsch, Morton, 2o, 51, 39S, 425 
Dewej, John. 4, 11, 25, 357 
Diamond, Leon N , 119 
Diodench, Paul D , 238 
Diwn, mifred J , 105 
Dolbcar, Amos E , 352 
Domas, Simeon J . 398, 418 
Doppelt, Jerome E , 418, 425 
Dougliton, Isaac, lb 
Douglass, Harl R , J2 i 
Drcssel, Paul L , 120, 134, 140, 345 
DuBots, Philip H , 1J4 
Dunlap, Jack W , 234 
Dunlap, Knight, 37 
Dunn, Leslie C , 8 
Durant, Hill, 10 
Durost, Walter N , 240, 300 
Durrcll, Donald D , 334 ^ 

D>cr, Henrj B , 326 
Djkcroa, I\arl W , 119 


Eaton, Merrill T , 21 
Ebbinghaus, Hermann, 30, 31, 34 
Ebel, Robert L , 191 
Edmiston, R W , 201 
Edwards, Alien L , 105 
Edwards, I Newton, 347 
Lells, Kenneth W . 278, 422 
Lelb, Walter Crostij, 414 
Cinstcm Albert, 7, 25 
Elhott, Edward C , 40 
Ellis, Albert, 51, 206 
EUis, Robert S , 365 
EUis, Robert W , 398 
Ellwood, Charles A , 10 
Flsbree, Hillard S , 408, 415 
Elwell, Marj, 120, 336 
Engel Thelbum L , 363 
k,ngelhardt, N L Jr , 397 
Engelhardt, N L , Sr , 395 
Engelhart, Max D , 49 134, 165 
Enckson, Chfiord E , 371 
Eunch, Alvin C , 165 311, 377, 395 
h Vans, Robert O 407,415 
Eysenck, Haas J , 52 
Ezell, L B 191 


Pabre, Jean Hcnn, 9 

halls J D , 40 

Fan Chung-Teh, 134 

Farlej , Belmont Mercer, 402, 403 

Fechner, Gustav T , 31 

Ferguson George A , 125 

Femald, G G , 47 

Femald Grace M , 345 

Ferrell, Guy Y , 88 



ALIIIOR INDEX 


Tula, Silvio, 7 
I inch, Frank H , 182, 258 
1 indlc), \\ftrrcn G , 131 
i iih, Lom« J , 105 
i”i«hcr, GtorRC, 38 
higher, Ronald A , 8, 88 
nanagin, John C , 52, 134, 14G, 204, 206, 
215, 274,288, 300 
n^ilitr. Mane ^V., 304 
lolcj.J D,372 
I olcj, John P., Jr , 34S 
(oik, SB, 304 
lorl mo, George, 310 
I ottltr, Frtd M , 300 
Franzen, Rajinond, 208 
Frcdcriksrn, Norman. 246, 300, 371 

1 rccmin, Irank N , 30 

Freeman, Frank S , 55, 13o, 3d. 

Froehch, Clifford P , 371, 41o 
Furfej, Paul H , 102 

Gable, Sister Fehcita, 310 
Gage, N L , 200 
Galen, 17 

cSup.'GTOredn.SI jg j, 33 35 40, 

Garter: Enc I 4,5011,2, 300 
Garrett, narle> i , SO . 

Garrett, H'"'! ^{,','’''137 
Gatos, Arthtir I , 'SJ, 00', 

Gorboricb, !la,n'0"“,2 • a 
Gilbert, Arthtir W , 36a 
Glli.briat,i;ototS,41a 

GiUe<pie, b . II , ^ 

Gillil'ind. A R 
GU*ef, 

Goddard Jlenr> H . 33 

Goheen, Howard , 55. loz 
Goldcnw ciser, ’l27 

Good, Carter \ , lo. ji, jg2, 191 

Goo<ieaoa|;lt.norcnceL,3J,63. 
Goo^sen, Carl t , 44. 

Gordon, Ham C , 22, 4Ui> 

Gordon, \1 E ■ l®^, , 

Grambs, Jean E , «» 

Gray, J Staidey, U 
Green, Bert F , Jr , 200 
Grcenbern, Jo'* ,77 

^recna H A 1|3. % 

Gregory, C . 

Grimes, James 4\ 

Guilford, J F , 100 , ^ ,3, 

GiiUifcen, Harold, 25, 00, >0 

Hagen, Etobcth P , 418 
TT<a( 7 ffard, Ernest A , 27» 

l”irS sS“!'46,50,52,404 

Hall: Wilber, 299 
Hangcn. 24a 


Harap, Henrj , 34S 
Harlow, Harrj F , 327 
Hams, Albert J , 294 
Harry, David P , Jr , 168, 189 
Hart, 49 

Hartley, Hei^ H , 406 
Hartshorne, Hugh, 47, 48 
Hartung, ilaunce, 327 
Hawkes, Herbert E , 150, 156, 165, ISO, 
185, 186, 197 
Hawkinson Mabel J , 336 
HeaU, William, 35 
Hell, Walter G , 110, 111, 286, 2S7 
Heim, Ahce W , 327 
Heimann, Robert A , 371, 422 
Heims, H , 288 
Heisenbei^, W , 132 
Helmholtz, Hermarm, 8, 31 
Henrj, Ljle K , 311 
Henry, Nelson B , 191, 206 
Henslej , Iven H , 139, 164, 197 
Hemck James B , 12 
Hertz, H R.ll 

Hevner, Kate, 174 ,, -e i-m oqq 

Hildreth, Gertrude H , o4, 58, 130, 2SS, 
333, 337, 338. 345, 366 
Hilgard, Ernest R , 327, 373 

403,409 

HSIimpS, Augeitiis D . 266 

Holbngsncrtb. UU S , 234 

aM'k’Jf.,236,237 

Herat, Psif 186 
Howerth, I '' . 8 
Hubbard, Henrj D , 247 
HudelsoD, 43 

HS!’CHrkL,54,351,352 
Hullec, C E , 43 

llS?L»^2^n22,326 

Irion, Arthur 1 , 327 

K'i&rS,401 

Jaclson, Rg«rt W ^120 246 

Jta,MS:25,61,393,425 

SlSifw Stanley 30,31 

Johnson, Bess b , oil 

Johnaoa, FranUm " • 3® ,, 

Jones, L'le. 

Jones, Vernon 47 



470 


AUTHOR INDEX 


Jordan, A M , 162, 191, 245 
Judd, Charles H , 368 

Kandel. Isaac L , 38, 193, 400 
Katz, Pavid, 9 
Kavruck, Samuel, 55, 162 
Kav, Marjorie E , 337 
Kelley, Truman L , 52-54, 56, 115, 116, 
124, 132, IGl, 170, 229, 273, 290, 290, 
298 

Kelley, V H , 17G 

Kelvin, Lord, 7 

Kemble, Edmn C , 25 

Kendall, Maurice G , 105 

Kepler, Johann, 7 

Kejs, ^oel, 22,311,312, 330 

IvkaNT dm, Omar, 4 

ICilpatnck, Milliam H , 17, 18, 356 

King, Harold V , 422 

King, Ronold, 7 

Kinne>, L B , 165 

Kirkpatnek, James Earl, 310 

Kitch, Loran V , 309 

Kostick, Max M , 29, 152, 20G 

Kraeplin, Emil, ^ 

Kruglov, Lorraine, 152 
Kuder, G F , 51, 124, 129, 219, 416, 417, 
419, 421 
Kugle, 810 

KuhlxDann, Fredenck, 33, 182, 2o8, 2SS 
Kulp, Hamel H , 22, 312 

Laird, Donald A , 49 
LiminoQ, £ ,<321 
Lao^mr, Charles R , 103, 134 
Langmuir, Imng, 133 
LiWler, Eugene S ,249 
Learned, IN ilUam o , 350 
Lear; , B E , 141 

Lee, J Murra>, 164, 165, 180, 197 

Lee, Richard E , 4 

Lefever, D NVelty, 371 

Lehman, Harvey C , 51 

Leiter, Russell G , 422 

Lennon, Roger T , 278, 287 

Lentz, Theodore F , Jr , 47, 154 

Leonard, E A., 240 

Leonard, J Paul, 331, 398 

Leonard, Sterling A , 171 

Lev, Joseph 105 

Lewm, Lillie, 246 

Ligon, Ernest M , 228, 230 

Lmcoln^ Abraham, 375, 415 

Lindquist E F , 22, 25, 55, 58, 105, 119. 

121, 127, 135, 139, 142, 150, 154, 156, 

160, 162, 165, 180, 185, 186, 191 197 

206, 245, 274. 295, 300, 327, 345, 346. 

371, 417, 440 » » » 

Lmdvall, Carl M , 398 
Lmdzey, Gardner, 176, 190, 244, 421 
Lockhart, Aileene, 22 
Locy, William A , 8 
Longstaff, H P 311 
Lord Frederic 42o 
Loi^e Irving 2o 135 152, 278 
I undholm H T 168 178 


McAllister, Jane E , 22 
McCall, William A , 17, 44, 53, 54, 227, 
279, 295, 301, 302, 377 
McClelland, David C , 327 
McConn, Max, 28 
McConnell, James, 51 
McCullough, Constance M , 345 
McGeoch, Jobn A , 327 
AIcGregor, J B , 195 
McKinne% , H T , 30-1 
McNamara, NN alter J , 135, 160, 102, 181, 
191 

McNemar, Qumn 32, 52, 91, 108, 110, 
111,278,284,280 287,429 
Mackenzie, Goraon N . 300 
\IacKinnon, Donald W , 327 
Madison James, 402 
Mailer, Julius Bernard, 300 
Malthus, Thomas R « 10 
Mann C R , 150, 156, 105, ISO, 185, ISO, 
197 

Mann, Horace, 29, 38, 45 
Mann, WMliam A , 345 
Marconi, Guglielmo, 11 
Marston, William , 49 
Martin, Abe, 19 
Massej , I rank J , Jr , 105 
^iathe^, Kirtlej F , 0 
Mathen-s, C 0 , 49 
Maxficid, Francis N , 67, 58 
Maxwell, James Clerk, 11 
Mav, Mark A , 47, 48 
%Ia\er, Joseph R , 9 
Melb>, Ernest 0 , 365 
Mendel, Gregor J . 8 
Menninger, Karl A , 299 
Merrill, Maud A., 34, 282, 288-290, 351 
Messenger, Helen R , 304, 406 
Meumann, Fnist, 31 
Mejer, George, 324 
Mejer, Max, 39 
MiHer, W S . 287 
Milb, Lewis H , 168 
Miner, James B , 51 
Mitchell, Claude, 319 
Modley, Rudolph, 273 
MoUenkopI, William G , 371 
Monroe, Walter S , 43, 44, 49, 56, 58, 59, 
113, 114, 165, 198, 200, 297, 325 
Moodj , Caesar B , 59 
Moore, Bruce V , 51, 54 
Morgenstern, Oskar, ^3 
Momsett, L N , 407 
Mort, Paul R , 394-396 
Mosiet, Charles I , 

Mowrer, O Hobart, 327 
Muller, Johannes, 8 
Murph/, Gardner, 348 
Myers, George E , 368 
Myers, M Claire, 

Neale, M G , 404 
Nelson M J , 183 
Nerazek Claude L , 2S4 
Neumann, John Von, 423 
Newcomb, Theodore 327 



AUTHOR INDEX 


471 


Kewcns, Lyndall Fisher, 196 
Ncttland, I* Ernest, 337 
Ne;\man, Sidney H , 311 
Nixon. Belle M . 29, 152, 206 
Noll, Victor 11^311 
Norton, John K , 249 
Norvell, Lee, 321 
NowUs Vincent, 327 

Odell, C W , 50, 105, 155, 162, 191, 200 

Oden, Mehta H , 300 

Oebum William F , 9, 10 

Olson, Helen F , 197 

Omvrake, K T , 178 

Orata, Pedro D , 370 

Orleans Jacob S . 301 

Osier, Sir Wilham, 33S ^ 

Otis, Arthur S , 36, 110, HI. 220, 221, 286, 

Olto^’H 'nr, J , 318, 358, 359, 304 365 
Oxtoby, Toby, 372 

Pace, C Roliert, 374, 377, 39S, 399 

Panlaatgut, Isidpro, 315 

Parlcn. Mildr^ D , 52 

Partridge, E ^eMton, 300 

Paterson Donald G , 35, 37 

!:Sron!'iiLT8,'31,“33,-<8,85,86,102. 

101, 109,433,435 
Pease, Glenn H , 313, 314 
PeatUe, Donald Culross, 9 
Peel, E A, 327 
Petera Charles 0,243 
Peterson, Joseph, 69 
PtaUips Alexa'idcr J, lo6 

Pieper, C J , 168, ‘“’jl' 44 59 287 
T , 299 

Postman. Uo J , 327 327. 340, 

Pressey, Sidney L , iv, 

363,364,360 
Price, Helen G , 180 
Pythagoras, 9 

Quetelet 10 

Raths, E , 141 1« 

Reaves, W C , 

Reeder, Ward G , 404 

Reeve, Ethel B , 183 ^ jpo, 312, 

Remmers, Hermann 

nM 20 21,38,45,67,59 

IIS lx V'' 

SoukSxrEt«i 


Ross, G C , 133, 240, 306, 311, 317, 319, 
320 363 

Rothney, John W M , 371, 372, 415, 422 
Rousseau Jean Jacques, 347 
Ruby, laonel, 26 

Ruch, G M , 54, 59, 156, 163-165, 170, 
193, 195, 294 
Rutc, Harold 0 , 17, 59 
Rdon, Phillip J , 111, 134, 278, 300, 417, 
419,420 

Russell, Bertrand, 5, 6, 11, 12, 19, 26 
Russell David H , 327, 373 
Ryan, T M , 182 
Ryan, W Carson, Jr , 194 
R>ans David G , 31, 246 

Sacks, Elinor L , 133 
Sarton, George, 26 
Saucier W A , 24 
Saupe, Joe L , 398 

Riiivain- Walter Howard <sbu 

Scales Douglas E , 15, 21, 24, 60, 127, 133, 
249, 264 296, 338 
Schrader, \\ lUiam B , 103, 371 
Scbraraioel, H E , 182 

&hwab’, Josrph j> . '6^ 

Scott, WdharaO N , 415 
Scru^, Sherman D , 3*9 

iSfe&d'G'l34 245,300,371 

Sege? Dayd, 

Secuio Edouard, 35 

Shakespeare, 

Sharp, George, 421 

Sbaw, PiiMe C , 38 

Sherman, N n,H‘* 

Stub, Hu, 4 

iSff.B F,168 177g 

sS 

Smeltzer, C H , Ali 
Smith, B Othemil 6 11,^8 

Smith,EugeneB.l“t^ ' 

„ 422 p jgg 189 

Sones, W VY 

Spanney, | ' ^ jgg^ 24t> 



472 


AUTHOR INDEX 


Slaicb, Daniel, 40, 41, 63 
Starkej, Mary L , 246 
Stauffer, Rus'^ell G , 340 
Slenquiit, John L , 37, 210, 237, 23S, 210, 
278 

Stephenson, William, 52, 55, 191, 216 

Stern, Wilhelm, 31 

Sternberg, Jack J , 419 

Stevens, S Smith, 20 _ 

Stoddard, George D , 54, 16b, 225, 287 

Stone. C W , 39, 16S, 331 

Stoufier, Samuel A , 24 

Strang, Ruth M , 345, 372 

Strajcr, George D , 395 

Strong, E K , Jr , 51 

Strunk, Mildred, 52 

Stuit, De«e) B , 340 

Stutsman, Rachel 2S9 

Sueltz, Ben A , 327 

Super, Donald E , 55, 240, 372 

Sutherland, A A , 348 

S>ke3, Gresham M , 415 

S) monds, Percival, 48, 54, 163 

S«an, J N,276 

Taba, Hilda, 303 
Tallmadge, htargaret, 323 
Terrnan, Lewis M , 30-34, 53, 108, 110, 
111, 170, 232, 279, 2S0, 282-284, 2S6- 
288, 290, 299, 345, 331, 303, 306 
Terry, Paul W , 323, 324 
ITielen, Herbert A , 414, 421 
Thisted, M. N , 313 
Thompson, George G , 323 
Thompson, Loring M , 273 
Thorndike, Ednard L , 17, 30, 33, 34, 38, 
39, 43, 51-55, 113, 121, 133, 237, 260, 
279, 293, 323, 376, 400 
Thorndike, Robert L , 116, 284, 300, 400, 
415 

Thurstone, Louis L , 37, 38, 49, 52, 59, 144, 
221.281,289, 418 

Thurstone, Thelma Gwinn, 2S9. 418 
Tiedeman, David V . 300, 371, 398, 418, 419 
Tiegs, Ernest Vt , 175, 358 
Tilton, John W , 23 
Tinkelraan, Sherman, 152 
Tippett, L H C , 105 
Todhunter, Isaac, 19 
Torger«on, Warren S , 206 
Townsend, Agatha, 135, 162, 191, 246, 372 
Travers, Robert M W , 135, 162, 191 
Trailer, Arthur E , 103, 127-129, 135, 162, 
183, 191, 195, 211, 243, 246, 300, 334 
341, 344 345, 372, 378, 398, 415 
Trow, William Clark, 414 
Trojer Maunce E , 374, 399 
Tsao, Pel, 298 
Tucker, A C , 240 

Turney, Austm H , 22, 311, 312, 358, 360, 

Turrell Archie M , 371 
Tyler, F T , 314 

Tiler, Pvalph W 21, 114, 115, 117, 140, 
;4I, 327, 338, 346, 374, 401, 415, 422 
Ts^on, Robert, 9 


Updegroff, Ruth, 225 
Upton, Clifford B , 119 

Vaughan, Kenneth W'., 245 
Vernon, -M D., 273 

Vemon, Philip E , 135, 170, 190, 221, 244, 
421 

Voclktr, 47 

Votan, Dand F , Sr , 154 


Waldrop. nolx.rt S , 417 
W'alker, Htltn M , 10, CO, 85, 90, 105, 378, 
45b 


Warmr, Chirk < Dudltj, 37 
Wftshliurne, John Noble, 249 
Wa‘‘hington, George, 20 
W'At«on, Goodmn, 48, 49, 422, 423 
Watts, Winifred, 406 
Weber. E 11,8,31 

Wechslcr. Da\id, 55, 215, 279, 282, 289, 
418, 422 

W'cidemann, Charles C , 194, 190, 198, 
200, 204 
W’i.»txel, Henrs I , 371 
W'oitzman, Ellis, 135, ICO, 102, 181, 191 
W’e^le\,E B,182 

We«m’nn, Alexander G , 101, 103, 134, 245, 
371, 419 

W'ostawax.r W\ 7, 12, 13, 132 

Wlieeler. L R 277 

While, CljdeW’., 300 

Wlutc, Emerson E , 29, 30 

Wliitc. Hiilxirt B , 314 

Whitehead, Alfred North, 4, 8, 9 

Whitnci, Frederick L , 26, 399 

Wilbur, Rax L>man, 338 

Wilder, M, 311 

W lids, Elmer Hamson, 362 

Wilkins, Laro> W’ , 303 

Willey, Rox D , 372 

Williams, L>onald, 26 

Williams, L A , 359 

William®on, E G , 372 

Willis, Marj , 422 

Wilson, Guy M , 134, 332 

Wilson, Kenneth, 371 

W issler, Clark, 33 

Wittv, Paul 51, 363, 366 

W'olfle, Dael, 372 

Wood, Ben D , 54, 276, 350 

Woodworth, Robert S , 49 

WToody, Thomas, 356 

Worcester, D A , 201, 246 

Wnght, W. H E , 174 

Wnghtstone, J. W'ayme. 117, 184, 187, 200 

Wnnkle, William L , 410, 411, 415 

Wnston, Henry M , 376 

Wundt, Wilhelm, SO, 33 

Wyndham, Harold S , 358, 359 


Young Kimball 59 
Yule, G Udny, 105 


Ziegfeld, Edmn, 377 
Zook, Geo^e F , 382 



Subject Index 


Abihltea of Man, Spearman, 54 
Abilitj, defined, 27o 
Abditj groujw, d57-Cj 
acccleralion, 303-05 
arguments for and against, 35S 
continuous promotion, db5 
experimental evidence, 358-CO 
retard ition, 303-05 
technique of grouping 301-03 
Abnormal py choJoiy , i fj«ee and, 31-34 
Acceleration, 303-65 
Accoropli«bniont quotient (AQ), 298 
Achievement 

intelligence and, compiring, 296-99 
scores, intelligence and, corabuuog.29S- 
99 

Achievement tests 
defined, 23 

diagnostic, development, 44 
history, 3S-16 

improved examuiations, 45-IG 
progress before 4918, 38-39 
progress since 1918, 43-45 
unreliability of school marks and ex- 
ammaiions, studies in, 39-13 
intelligence vs aptitude, 418-19 
nornis^ use m interpreting scores on, 
290-90 

objective tjpe, development, 44 
practice, development, 44 
specific t>po, development, 44 
standardized and nonstandardized, ad- 
vantagis and limitations, 216-17 
survej development, 44 
validation, 111-21 
criticisms, 113-14 
curricular vei'sus statistical, 111-13 
direct vs indirect methods, 115-17 
item aaaijsis, 1J7-J9 
standard tests, judging, 119-21 
Tylers suggestions, 114-16 
Activity movement, 356-57 
Administration 
ease of, usabihtj, 127-28 
procedure for, 22S-30 
school, measurement in, function, 21-2. 
test, 225-30 
lime for, 225-27 
who should adniinister, 227 


Administrators, school, records and re- 
ports for, 240-13 

Age 

cynological, see Chronological age 
educational, see Educational age 
mcrea^, criicrjon of inteJhgence, 108 
mental, see hJental age 
promoliOD, see Promotion age 
subject age, see Subject age 
Agenats of public information, ordinan, 
4(«-04 

neirspapers, local, 402-03 
student puDhcations, 403-04 
Allport-Vernoii.Lmd 2 ey Study of Values, 
170, 190, 421 

Alternative-response tests, 174-70 
advantages, 174-7o 
constnictiofi, rules and suesrestions for. 
178-79 

definition, 174 
illu&trado&s, 175-78 
lumtatiODS, 174-75 

Amenca, applied psjcbologj and, 33-34 
American Council on Education ^jeho- 
logicai E'camination, S26 
Amencan Journal of Psychology, 52 
American Fsjchological Association, 19, 
36, 416 

Annual reports, public relations, 401 
Answer kej 3, prepanng 158 
Application, ease of, u'abihtj, 129 30 
Applied psychology, America and, 33 34 
Applied sciences, measurement m, 11-12 
AppratsttM yocaiwnal Filnesa, Super, So 
Aptitude Testing, Hull, 54 
Aptitude tests 

achievement vs intelligence, 418-19 
special, 37-38 

AQ, see Accomplishment quotient 
Army Alpha tests, 16, 36-37, 108 
Army Beta tests, 37 

Arthur Adaptation of Leiter InternatiotuJ 
Performance Scale, 422 
Astronomy, measurement in, 6 
Ayers educational index, 115 

Bar graphs, 262 263,264 
Barrett-Rv an Literature Test Silas war 
tier, 182 


473 



474 


SUBJECT INDEX 


BiWc testing do\ ice in 27 , _ . 

Bibliography of Miulid Tests arid RiUttig 
6calrs, iJiIdri th, 54 

Biological scienif measurement in, 7-9 
Books, important ol 55 


C ^ sfe Chronological age 
Califtriua Adiici-eintnt Tests, Adianced 
Battirj, 175-70 254-5"i 250 
Cjiilorni t bhort-f orin Tost of Mtntal Ma- 
luril}, 110, 2b7 
Capacit\, dtfinc-d, 27t>-77 
Centr \i tv ntltiicy, concept of test « ila, 09, 
73-74 _ 
measure 75 

Cli irarltr Education Inquirv, 47 
Character measurement, Inslorj 4(>-52 
beginnings, 40 
development, 40-48, 51-52 
intcreiew, 51 
questionnaire, 19-51 
rating scales, 48 

Children s Apperception Ti st (C AT ), 421 
Chinese civilization, testing deMces m, 27- 
28 

Chronological age (C 279-80 
Circle graphs, 203 

Claiifvioung self marking tesla 129 
Cla^siGcation 
abiUt> groups, 337-63 
acceleration, T63-05 
arguments for and against, 3a8 
continuous promotion, 305 
experimental evidence ToS-60 
individual and group insiructton, 357 
retardation 303-03 
technique of grouping, 301-63 
activity mocement, 3ob-37 
human vanainlitj , 347-5G 
group differences, 348-50 
individual differences, 351-52, 353-5C 
problem, 347-18 
trait variability , 352-^ 
test results, Gl-GO 
rank order, 61-62 

Class interval, selecting, frequency distri- 
bution 64 

CopIRcient of correlation, see ProiUict- 
moment coefficient of correlation 
Coherency, cntenoii of intelligence 109 
Cole-voii Borgersrode Scale for Itatuig 
Standardized Tests, 222-24 
College Board Renew 192 
College entrance Board Esanunatioii», 
192 195 204 30a 

Colorado experiment, reporting to par- 
ents 410 11 

Column diagram 258-59 
Committee on Sundarfb for Graphic Pre- 
sent ition 272 
Completion tests 170-74 
advantages 170 

construction rules and suggestions Ui 
172 74 

difinilion 170 


Completion tests (Cont )• 
illustrations, 170-71 
limitations, 170 

Computations, concepts versus, quantita- 
tive data, 09-75 

Cooccnlralion obitctive of instruction, 
145 
Concepts 

computations tersus, ouantitativc data, 
(» 9-75 

co-rtljtion«hip or concomitant varia- 
tion 8.5-90 

Cuncoiiiitant xanation, concept 85-90 
Coucumnt xalulitj, 417 
Coiilldi nee, slagt in use of tests, 67-58 
Congruent validity, 417 
Coiistructioti atuf Analysis of Achievement 
lists, Adkin et a! , 55 
Consiructian of tests, see Test construction 
Conti nt validity, 417 
Continuous promotion, 365 
CxM>iH.raiion, objcclivt of instruction, 146 
Cooiierative Athievcmcnt 'Jests, 112, 140, 
291 

Cooperative English Test, 171, 422 
Cooix-ratne Trenth Test dumor Form 
1916, 183, lSS-89 

Coojicrativc Plane Geometry Test, Re- 
vised Scries Q, 177 

Cooperative Solid Geometry Tests, 178 
Cooperative Study of bccondary Siliool 
Standards, 374-76, 370, 378-80, 
383, 384, 388. 390, 394, 395 
Cooperative Test of Social Studies Abih 
ties 152, 184, 157-88 
Co-relationship, concent, 83-00 
Correlation, 86-90 

coefficient, see Product-mument coclB- 
cicut of correlation 
rank, 95-98 

Cost usability of mcasunng iiLstrument, 
130 

Creativcnesi, objective of instruction, 144 
Criterion prohlem, 417-1^ 

Critical caution, sUgc in use of tests, 58 
Crude scores, see Raw scons 
Curiosity, stage m use of tests 57 
Curncubr validity, 101 
statistical versus, 111-13 

Davis Eelb) Games for Grades I- VI, 422 
Deciles, 75 

Decision making, information and, 423-24 
Dtnvcd scores, 288-90 
intelligcucc quotient, see IntcJhgence 
quotient 

raw seoroH versua, 278- 79 
Deviation 

qunrtdc, see Quartile dcvniion 
standard see biaiidard dtvi ilion 
Diagnosing Permnuhty ana Conduct Sj- 
monib, 48 
Diagnosis, 328 45 

causes of errors, locating 337-41 
individuals necdine h)."atuiK. 332 33 
1* veU ^12 



SUBJECT INDEX 


473 


(Cont ) 

nature of dilTicully, locatiiiR ZiZ-di 
nature of cduc.itional, 32S-30 
prtNcntnt, 315 
problciu, 32S-31 
reiiiidu! procedures, 341^j 
tecliniquts, 332-45 
\alue in education, 330-31 
Difltrtnlul Aptitude Tests, 41S-19 
DifTtrtiUial prediction, 419 
UilhcuU>, measure. 

Discnmmition, measure, ilcm-analj is 
437—10 , 

Discriniination table, item-anab'is, 44 - 

Dislnbutton, irequenty, <€< Frequency 
distribution 

Draft, prcliminar}. preparing the test, 
147-18, H9 

E V, see Educational age 
Education 

meaning of, ‘3-10 

tests, hiaiorj, 40-5- 

intelligence tests, bistorj, 3U-3» 

i^ent tendencies, SO-ob 
types, 23-M 
victt-sof, 1" - 

'r»n:%l®o“ 

rlistoricin, 10 

«on, 53 

nrronn‘!|;;Stve.c.,20(F9. 
limitations, 291-93 

llrnr.- 

f JSl-91 

errsiii-oir 

use, 292 jj. Bureau 243 

Educationa 257 . 

rducational aest cu pjogre^'ive Wu 

&t-A'earStud> j^o 37^4 

M”MuSmnalqn«M"* 


Error 

causes of, diagnosis, 337-41 
measures of, 101-03 
measurement, 101, 102-03 
sampling, 101, 103 
technique, 101, 102 
Essa} examinations, 192-205 
advantages, 196-97, 200-01 
grading by sorting 204^o 
improving suggestions lor, iy/-^o 
advantages, special 200-01 
construction, 19&-99 
scoring 202-04 
u-^, 19S-99 
hi itations 193-95 
preparing students to take 201-02 
reliability, 193-95, 196 
usability, 195, 196 
validity, 193 196-97 
Essentials of Psychological Testing, Croo 
l»ch, 55 
Evaluation 

in guidance, 367-71 71 

co-operative venture, 370-71 
importance 367-65 
meaning 367-68 
mea'urement m, 369-70 

"'^'Sr;S^®|tud> o|.gf 

School Standards, 375-80 
difiBcully , 376-77 ««, qi 

educational program, 3»4-y4 
importance 376-70 
index of variation, 39^5 

Sy„TaS)»\pl-y95j97 

problem 373 SO 
teaching efficiency, 377-i8 
testa m use siS .„A,y> 

Ero’Pu^^Tes.mPbys-,18^ 

e»nu»U0» 

fchool .c_iR 

Oe,ma»c ,nd 

^ 30-31 



170 


SUBJECT INDEK 


Frequcncj distribution or table (Coitl ). 
graphical representation {font ) 
circle graphs, 263 
column diagram, 25S-59 
frequencj polygon, 2otMiO 
histogram, 25b-o9 
pictographs, 263 
pie graphs, 263 
skewed cur\c3, 262-63 
smooth curve, 2(>(H52 
symmetrical curves, 262-63 
tjqicwrittr graphs, 263 
which graph is be«t^, 263-04 
making, 64-()6 

scattergram or «cattcr diagram, 67-69 
tvicv-way , 67-69 
Irequencj poKgon, 2o9-C0 
trequenej tabic, nee I requenej dislribu* 
tion 

George Yi’ashmgton T}niv er=it) Iliig4is4i 
Literature Test, 178 

Germany, expcnrnental psychology and, 
30-31 

Gestalt school of psychology, 0, lG-17 
Grade norms 
limitations, 293-94 
use, 293-94 
Grading, ecc Scoring 
Graphs, 247-73 

conatructmg, suggC'Uons for, 271-73 
dutnhutions, two or more rcprc'ciiting, 
264-71 

central tendencies and vanabiUticsof, 
209-71 

central tendencies of, 267-69 
entire di-tributioa«, 264 
frequency , see under rrequeney distri- 
bution 

percentile curves, m-e, 2G5-G0 
polygons, use, 204-6o 
frequency distribution, rcprc«cnting, 

253- 64 

bar graphs, 262, 263, 264 

circle graphs, 263 

column diagram, 258-59 

histogram, 258-59 

pictography, 263 

pie graphs, 263 

polygon, 2o9-^ 

skewed curves, 262-63 

smooth curve, 260-62 

sy mmetncal curves, 262-63 

tvpewTiter graphs, 263 

which graph is best?, 263-64 

record of an mdmdual, renresentine. 

254- 58 

profiles for senes of subjects, 254-58 
profiles of single subject, 254 
value, 247-54 
attention getting, 247-49 
pomts clarified, 249 
retention aided, 249-54 
Gregory Tests m American History, 171 
Group 

ability see Abilitv groups 


Gr<>m> (Coni ) 
tlilfcrtiicts, 31S-50 
instruction, 3o7 

Grouped frequency di«tribution, 04, 65 
Guidance, evaluation m, 367-71 
co-operative venture, 370-71 
importance, 307-08 
meaning, 367-68 
measurement m, place of, 369-70 

llrinis Mental Growth Unity, 2SS 
Higher institutions, principles of evalua- 
tion for, 3‘'2-S3 
Histogram, 25S-59 

Holzinger-Crowdcr Uni-Factor Tt«tfi, 418 
llomogciicQus groups, see Ability groups 
llfitc to Miasurc in Education, McCall, 54 
Hiidelson Seale, 43 
Iliinian van ibility 
educational significance, 317-50 
group diSuencea, A\8-50 
indivadual differences, 351-52 
educational provisions for, 353-50 
nature, 347-50 
problem, 347— tS 
trait variability, 352-53 

IBM General I’urpo«e Answer Sheet, 159 
Iniproiemcnl of the WnlU-n Ezamtnalton, 
Ruch, 54 

Index of vanalion, evaluation of schools, 
397-98 
Individual 
diffcrenccy, 351-52 
educational pro\a«ions for, 353-50 
instruction, 357 
profile chart, 239 

Information, decision-making and, 423-24 
Initiative, objective of instruction, 144-45 
Instruction 

indivadual and group, 357 
mcayurement in, 303—425 
claysificalion and promotion, 347-65 
diagnosis, 328-45 
evaluation in guidance, 367-71 
evaluation of schools, 37^98 
function, 21-22 

motivation and practice in testing, 
303-27 

pubhc relations, 400-15 
trends, present, 410-24 
objectives, categonea, 141-42 
concentration, 145 
co-operation, 146 
creativene«s, 144 
initiative, 144-45 
interest, 145 
judgment, 145-46 
motivation, 145 

outcomes, provision for evaluating, 141- 
46 

relation of niea«ureinent to motivation 
m, 304-06 

teaching efficiency, evaluating, 377-78 
Intelligence 

achievement and, comparing 296-99 



SUBJECT INJ)EX 


477 


Intelligence {Cont ): 
tncinmg, lOS 

scorf<. a(.hie\ement and, combining, 

Terinan entona, 108-09 
Intelligence quolienl_(IQ) 
adeantnges, 2M-S.a 
computaUon, 2SI-83 
equating ^alut‘>, table for, 2^6 
interpret ition, 2Sd-S4 
liniitatioa'!, 2S5-88 
mental age vs , 279-80 
Intelligence te«=ts 

achievement vs aptitude, 
di fined, 21 

Ins lor.v, 30-38 , 

abnormal psjchologv, France anu, 

applied psjchologv, America and, 

children of necc«it} , 34-37 
e\|>cnmental psjchologj, Germany 
and, 30-31 

■iwcial aptitude tests 37-3« 

stitt^lical .ncthods, 
norms, U'o m mtrriiretmB scores on, 
270-90 

validation, lOS-U 

individual vs group, 109-11 
meaning of 

Ternnn criteria lO^J . 

Interest, objective ‘"f ''lir.' 4(ii2 
Interest measurement, hi'torj, 
lieginnings. 40 
development, 4r>-is, oi- 
interview, 31 
questionnaire, 49-51 

I„.orpSo^,’“”* i 

,„,nprZt.0. 0} Eiijnlmnci . 17 ro.,»rr- 

terest mcs'urcm «, „„j 

176, 254, 2oo 2(x> 

«0- 

diBiculty, mensure, «0 

discrimination, “'f 447_5i 

discnminauon table, i 

illustrative anaijsf • ’ 

Items, pteparme, 

n’?, .“reoefficent obt niimB 4,2 
Sntrd devnnon obtanimg 4- 
validation, 117-lJ 


Items, te*-! 

analvsis, $ee Item-analjsis procedure 
preparing, 43f>-37 
ranking, 454-53 

Journal of Applied Psydiology, 53 
Journal of Considltng Psychology, 53 
Journal of Educational Psychology, 52 
Journal of Educational Research, 53 
Journal of Kiemtic Psychology, 52 
Judgment.objectivc of in«truction, 145-46 

KR No 20, 134 

Kuder and Richardson formulas, 124 
Kuder Preference Record, 51, 129 421 
Ruhlmann-Fmch Intelligence Tests, 182 

L’ Annie Psychologique, 31, 52 
Learning 

amount nnd qualitv, relation of meas- 
urement to, 309-23 
asrareness of final examination, 312- 

15 

frequency of tests, 309-12 
knowledge of results combined with 
other iDcentivfS, 319-23 
knowledge of te«t scores, 315-19 
motivation m, relation of measurement 
to, 306-25 

tvpe of, relation of measurement to, 

Loiter Internstional Performance Scale, 
Arthur Adaptation, 422 
Letters to parents 41(^13 

Colorado espenment, 410-11 
'uSiTofcSoHiBbScboolSis. 
tern, 411-13 
Limitations 

alternative-response tests, 174-70 
completion tests, 170 
educationa ase, 291-92 „ 

educational quotient, 292 Jo 
e««av examinations, 190-90 
prade norms, 293-94 
intelligence quotient, 2bo-»» 
roatebmg tests, ISO 

nioasuremenb 12 10 

distribution, 64 

Local norms, value, 295 JO, -J 

Matcbmg 

“ITSon rules and -uBcCion,. 
189-90 

definition 186 

dlustmtimis 1^7 ^J 
liniiliinm-rr 10b 



t78 


SUBJECT INDEX 


Mean, 73 

computing, from scattergram data, 91 
finding, 7^81 
median ^ ersus, 74-75 
obtaining, item-analjsis, 453 
short ^a> to compute, 80 
Measurement 

errors in, controlhng, 13 
errors of, measure*', 101, 102-03 
importance, 3-4 
m applied sciences, 11-12 
in biological sciences, 7-9 
m education, 13-25 

achievement te«ts, historj, 3S-4G 
character, personalitj and interest 
tests, historv , 4G-5i 
function, 21-22 
historical development, 27-5S 
intelligence testa, histor 3 , 30-3S 
place of, 10-18 

pubhcationa, important, 52-55 
recent tendencies, 50-5S 
tjTKS, 23-25 
viev,-8 of, 17 
in guidance, 369-70 
in instruction, 303-425 
classification and promotion, 347- 
05 

diaroosis, 328-43 
evaluation in guidance, 307-71 
evaluation of "chools, 373-9S 
motivation and practice in testing, 
303-27 

public relations, 400-13 
trends, pre«ent, 416-24 
in modem world, 3-2o 
in phj'sical sciences, 6-7 
m “cience, 4-13 
in social sciences, 9-11 
limitations, 12-13 
problem of, 3-134 
generalizations tegardmg 131-34 
relation to amount and quality of learn- 
ing, 309-23 

awareness of final examination, 312- 
15 

frequenej of tests, 309-12 
knowledge of results combmed with 
other mcentives, 31^23 
knowledge of test scores, 315-19 
relation to motivation, 304 
m learning, 306-25 
in teachmg, 304-06 
relation to tj pe of learning, 323-25 
teacher and, purpose, 304 
teachmg emphasis and 304-06 
Measurement tn Higher Education, Wood, 
54 

Measurement in Secondary Education, 
Symonds, 54 

Measurement of Adult 7ntelfigen«, ^ echs- 
ler, 55 

Afe<wurer7iCTi< of Intelligence, Terman, 34, 
53 . » 

Measuremerd of Intelligence, Thorndike, 


Measuring instrument, eatisfaclorj’, char- 
acteristics, 100-34 
importance of problem, 106 
rcliabiht}, 121-27 
determining, methods, 122-24 
importance, 121-22 
interpretation, 124-25 
meaning, 121 
objectivity and, 125-27 
imbihtj, 127-31 
administration, ea«c, 127-28 
applic vtion, casj, 129-30 
cost, 1 ?0 

interpretation, ea®e, 129-30 
meaning, 127 

mechanical make-up, 130-31 
«conng, case, 128-29 
vahditv, 107-21 
achievement tests, 111-21 
considerations, general, 107-0S 
mleliigence tc«t8, lOS-11 
meaning, 107 

Meclianical make-up of tests, usabilitv, 
130-31 
Median, 73 

determining, from Bcattcrgram data, 94- 
95 

finding, 75-78 
mean versus, 74-75 
process of locating, 70 
Mental age (MA) 
concept, 32 
advantages, 2S0 

mlclhgence quotient \er«u8, 279-SO 
limitations, 280-81 

Mental Measurements Yearhools, 5-1, 120, 
121,218 

Mental quotient, 31 
Mental Testing, Goodenough, 55 
Methodology of Educational Research, Good 
et al , 127 

Metropolitan and Standard Achievement 
Tests, 291 
Midscore, 75-76 
Mind, 52 
Mode, 73 
finding 75 

Modem School Achievement Tests, Lan- 
guage Usage, 182 
hlotivatioQ 
expenment in, 300-07 
limitations on, 307-08 
types, 308-25 
importance, 303 
meanmg 303-04 
objective of instruction, 145 
practice effect, 326-27 
problem, 303-04 
relation of measurement to, 304 
m learning, 306-2o 
in teachmg 304-06 

studies, educational implications, 325- 
26 

for educational practice, 325-26 
for educational theory, 325 



SUBJECT INDEX 


479 


Multiple-choice tests. ir&-Sf nv. . ^ 

construction rules and suggestions for, 

definition, 179-SO illustrations, 187-89 

illustrations, 181-84 

limitations, 1S0~S1 niuitiple-choice 179-86 

Dossibilitic®, ISO-Sl 


ISationaj^Quncd of Teachers of English, illust 

National Education Association, 374, 380 l3 

National norms, 293 reil^u. 

Natural sciences, nie'i«urcment m, &-9 simpleM 

Nelson Iligli School English Test, 181-82 advai 

»- , , cornti 

Acw-spapers, local, agencj of puUic mfor for. 

mation, 402-03 defimi 

New 1 orL Slate Regents, 305 illustr 

XiMtccn F orty Mental Mtaiurementa Year- Uniita 

hooh, 55 types ll 

Nonstuidardired teats, standardized vs, validity 

274 75 164 

Normal cur\e, 2G0 Objectivity 

Normal Progrtss Chart, 2o7-58 O«»«palton. 

Norms Official pul 

prade, 203-04 06 

mtelligcnee and achi«.vtinent, compar- annual p 
mg, 29G-99 special r< 

interpreting scores oo achievement Ogive, 260 


comtructioD, rules and suggestions 
for, 184-86 
definition 179-80 
illustrations, 181-84 
limitations, 180-81 
possibilities, 180-81 
rearrangement, 190-91 
fiimple-recall, 167-70 
advantages, 167 

coiistiuction, rules and suggestions 
for, 169-70 
definition, 167 
illustrations, 167-69 
Unutations 167 
types 1G3 

validity and reliability, comparative, 
164457 

Objectivity, reliability and, 125-27 

Oeeupalumi, 53 

Official pubhcations, public relations, 404- 
06 

annual reports, 404 
bpccial reports 405-00 


tests, use, 290 96 

intcrprctiDg scores on uitelligence tests, 
use, 279-90 

interpreting scores on personality tests, 
use, 299-300 
local, value, 295-96, 297 
nitional, 295 
percentile, 294-95 
scores, raw and derived, 276-79 
standards and, 274-76 
North rentfal Association of Colleges and 
Secondary Schools, 3S2 
Northirestero Intelligence Tests 422 
Novel tc^ts and items, 422-23 

Objective tests 

alternative-response, 174 79 


Ohio State University Psychological Te«t 
129 

Opiiuon, public, mobilizing, 414-15 
Organization, school, evaluating 395-97 
Otis Quick-Scoiing Mental Ability Test, 
287 

Otis Scales for Eating Standard Tests, 220 
Oln Self-Administered Test of Mental 
Ability, 287 

Otis Self-Administering Higher Examina 
tion, 119-11 

Parents 

letters to, 410 13 
opinion of, sampling, 414-15 
reports to, 245 

Parcat-teacher association, pubLc mfor- 


a.,d P-„t'feSr ^o^..y 245 

dlS-LtioD8. 175-78 «« 


for, 178-79 
definition, 174 
illustrations, 175-78 
linutationa, 174-75 
'’ompletzon, 170-74 
advantages, 170 


aavaniages, nu , 

construction, rules and BuggesUons 


Percentile curve 260,265-66 
Pereentile norms, 294-95 
Percentile rank, 288-89 


for, 172-74 
definition 170 
liluatrationa, 170 71 
Unutations, 170 
cottstruction, principles, 163-91 
frequency of use by teachers, 163-b4 
Diatchyig, 186-90 
advantages, 186 

construction rules and suggestions, 
189-90 

definition, 186 


computation, 78 
D, measure of variability, S4-tw 
Personal Constant (PC), 288 
FersonaMy teste, HS'” “‘"'T'"'' 

me scores on, 299-3TO 
Fereonahty measurement, history, 46-5^ 

SS“pSmO-48,51^2 
mterview, 51 
qoestionnaife, 49-.^i 
rating scales, 48 



480 


SUBJECT INDEX 


Personnel and Guidance Journal, 53 

PersonmlStliCiton, Tests and Me^remcnt 

Techniques, Tliorndikc, 55 
Philosoph\ ufthe«cbool, evaluating, 3S3-84 
Phj'sical sciences, measurement in, 6-7 
Ph\Mcs, measurement in, 7 
Pictographs, 263 
Pie grapns, 263 

Pmtner General Abilitj Tests, 287 
Pmtner-Paterson Performance Scale, 35, 
37 

Planning the test, 140-17 

administration conditions, considering, 
147 


emphasis in course, reflecting proportion 
of, 146-47 

evaluating outcomes of instruction, pro- 
vision for, 14I-4G 

purpose to be served, cotaideruig, 147 
Plant, school, evaluating, 395-97 
‘Platform for the Use of Standard Tests," 
211-12 


Polygons, use, graphical representation, 
2t>Mj3 

PrA, see Promotion age 
Predictive validitt, 417 
Prebuimarj draft, prepanng the test, 147- 
48, 149 

Prepanng the t<«t, 147 55 
arranged in a«cend ng order of dilRcuHl , 
151-52 

diflicuUj of Items 14S-19 
directions to pupil, 153-51 
particular ij-pe of items placed together, 


pattern of responses avoiding regular 
sequence in, 1'52 
phrasing of Ueu 8, 140-50 
prelm inary draft, 147-48 
preliminary draft items, 149 
reMsion, cntical 149 
types of iteuis, 14S 

VikAwA. 4 «ntV«rta in 
answer, 150-51 

written record of responses, provision 
tor, 152-53 

Preventive diagnosis 345 
Primary Mental Ability tests (PMA), 219, 
418 

Principles of Science, Jevons, 30 
Product-moment coefficient of correlation, 
r, 86-90 

computing from scattei^am, 93-91 
interpreting, 98-100 
magmtude or si^e, 98-99 
obtaimng, M 

relationship represented by, 89 
reliability, 101 
sign, 98 

vahditv, 101, 109 
Professional journals 52-53 
Profile chart, mdividual, 239 
Profiles, 254 

senes of subjects, 2o4-58 
smgle subject, 2o4 
warnings concerning 243-45 


Progressive Education Association, 117, 
140, 374, 422 
Promotion 

acceleration and retardation, 363-65 
continuous, Soo 
Promotion age (PrA), 298 
Promotion quotient (PrQl, 298 
PrQ, see Promotion quotient 
Psjchograpb, 254 
Pay choiogy 

abnormal, see Abnormal py choiogy 
applied, 8CC Applied jBycnology 
CTpcnmcntal, see Ex^rimcntal psj choi- 
ogy 

Gestalt school 9, 16-17 
Pay choiogy Colloquium, University of 
iscoasin, 423 

Psychology of Musical Talent, Seashore, 5-1 
Psychom tnf a, 53 
Pubhcatioas 
important, 52-55 
books, M-55 

nrofe««sional journals, 52-53 
publu relations, agencies of public in- 
formation 

newspapers, local, 402-03 
ofTicial, 404-00 
student publications, 403-04 
Publicity, 401 

Public opinion, mobilizing, 414-13 
Public rehitions, 400-15 
agencies of pubhc information, ordinary , 
402-101 

newspapers, local, 402-03 
student publications 403-04 
letters to parents, 410-13 
Colorado experiment, 410-11 
suggestions for, 410 
University of Chicago High School 
System, 411-13 
official pubhcations, 404-06 
annual reports, 404 
spena\Tcparts, 

parent-teacher association# 413-14 
pnncipal sources, 402 
problem, 400-02 
programs meaning, 401-02 
pubhc opimon, inobihzing, 414-15 
report cards, 406-12 
Hill s study, 407-08 
trends m, 406-07 
school exhibits, 413 
school visitation, 413 
Pubhc School Attainment Tests for High 
School Entrance, 171 
Pubhshers of standardized tests, 464-65 
Pupils, report to, 237 

Qualitative evaluation technique, 420-22 
Quantitative data 

concepts versus computations, 69-75 
central tendency, 69, 7^74, 75 
eleven categories, 72 
four categones, 71 
grading dilemma, 69-70 
mean vs median, 74-75 



SUBJLCT INDEX 


181 


QuaulitatJNC data (Co«0 

concepta \cr>u8 compuUition« (CotiI ) 
one catcRorj, 70 ^ 
three catcK<)no«. a 
twocatrpoms, <0-p 
\ anabihtj , 09, 7 1-73 rn_7!; 

elementarj notion** conrernmE, G'l-7o 
mean, 73 

GnciinR. 7S-M 
meth in , 71 
rncdnn, 73 
findinp, 7 >^7s^_ 
mean a"* , 71--/<> 
mode, 73 

finding, 75 , . 70 

pcrctnlilc«, computation, 7b 

QuartilcdiMation, 7- 

vanibilita measure, Sl-b-s 
Quartilca,77 , , ncreouahtj.and 

•'* Accomp»«nl 

rb™r«su>"”r,»oi.o..t 

vanabdita moa«urc^8l 
Rank correlat! 


SoSEbVte 
j,nUns iwylf™’ . 

Jink order, 01-«. „,,onaMy. »■»> 

s?rrSsS'»‘ 

lecords, 23&-^o 94CV-43 

lor odniiiiirtroto'S.J*" ^,55 
groph.eal.rrP^''””'" 

,etSK'«,„tr',“^90 

interpreting 95-100 

obtaining, yi* . ^ni 

reliabilit> ! jOl 

validity or concomitant vans- 

co-relationship o,r,Q0 

Iron, concept, 8^ 

ocpectancytabte,!!^' 

rank correlation g„.93 

Bcattcrgram, coos«“\,J*g'^ 94 

94 

RehabiUty 


R liability (ronO 

dttermimng, methods, 122-24 
\nth one test form, 123 24 
with two test forms, 122 
c«sa> exammation 193 9o, 196 
importance, 121-22 
interpretation of test, 124-25 
meamng, 121 
objective tests, 164^7 
objectivitj and, 125-27 
present trend, 416 

oualitv ofsat!«fictor> measuring instru- 
ment, 121-27 041^' 

Remedial procedures diagnosis, 34l-4o 
Reiwrt cards, 406-12 

Colorado experiment, 410-11 
IIiH s study , 407jOS 

“mt‘etS/rfChicago High School Sjs 
tem. 411-13 

Reporting to parent*, see Letters to par 
ents 

Reports, 236-45 

annual, public relatam 404 
for administrators, 240-43 
special, pubbe rebtion', 40o-06 
to parents or public, 24o 
to pupil' 237 n 
to tcachere, 238-40 
Rcpulalioo, m 99“ 

Research m education, 19, 2U 
Results, test, see Test results 
Retardation, 363-6o 


Scaled test, defined, 24 
^SrdisSn^fshedfS,'24 

|*a»^|eVca.tergra. 

”^u”'SeSr‘Un|tra.ing,87 
;,Sr?K“ona from, rompulmg 

C“»CSW‘t7«-» 

384-04 

SSaSu‘.‘dt>a"t 298-97 



4S2 


SUBJECT INDEX 


School’, e%iIu'itJon (Cont ) 
pluh’S'onlij of, 1ST-S4 
pniHipies of, gcnoral, “ISO-SS 
fyi tlcmcuUirj ®rhoola, 3S0-81 
fyr higher iiLstitutioiis, 'iS2'S3 
for HcrondarN schools, 3SI-82 
prnhitin, i7J -hO 
teal lung tUii k iicj , 877-78 
tcfts in, iL«c, .i79 
Science,, lucasurtincnt ui, 4-18 
&it.iwe Ui~>eari!i j\5£<K.nte8 (SllA) 
>.oii-Vtrb-il l«*st, 110 11 
Pnmm Menial Ahilitj testa, 110-11, 
2.'>7 

sclf-'c-oniiR test 129 
Sen nuGc lutUiod, 4-C 
Soiree 

an.ihntig and interpreting, 284-35 
dehnilion, 27i> 7h 
den%e«i, «<e lJcn\c<l scores 
mtrlligence niid HchicvLincnt, comlnn- 
ing 29S *19 

inlcrpretuiR on aelucv rnicnt tests, use of 
norms in, 2 K) IKi 

interjircluig on nitellij.'ene-e IC'ls, use of 
norm' m 279-90 

mterpretmg on i»er=onalil} tests, use of 
iiorne' in, j'j'l JQO 
raw \8 derived 278-79 
sigma, 2S9-90 
standanl 85, 2S'J 90, 295 
T-'torcs, 293 
Z-'torc' 85,289-90 
Scortze, 129 

Scoring the tests 230-34 
ca«< of, iis-iliilitj, 128-29 
essii cxaiiiiiniton 202-01 
bj 'orting 204-05 
procure lib 3S 
rules prejinring 158 
tc(linii]ucb iioed 2.11-34 
who «hoiild •■core 230 31 
Scott Man*to-Mfljj hijjk’, IS 
Seashore Test of Mn-ic ill alent 37 
Secondarj «chooIs, principles ol noluatioo 
for 381-82 

Seleded References on Test Coruslruehon, 
Mental Test Theory, ajut Statistics, 
1929-49, Goheen-l\a> ruck, 55 
Semi quantitati\c evaluation technique, 
420-22 

Seven Seals of Science, Majer, 9-10 
SecenUenth 1 earhook of the A aticnof Society 
for the Study of Z^ducotion, 63 
Sigma 'cores, 289-90 
Stmiile-recall te'ts, 1G7-70 
advantages 107 

construction, rules and suggestions for. 
lfi<t-70 

dchnitian 167 
illustrations lu7 -09 
limitations 167 
Skewed curves, 262-03 
Smooth curve, 200-62 
Social sciences measurement in, 9-11 


Sone«-Harr> Higli School Achievement 
Test. 1 68, 189 

Sjiearman-Brown formula, 124 
Sficciil aptitude tests, 37- kS 
S{K,cLiI n-iiorls and publications, public 
relations, 405-00 
SifcciHc. di-icrutmcrs, 149 
Square roots, coinpiituliou, 45t>-5? 
Standard deviation, 72 
computing from scnitergmm data, 94 
obtaining, item-an il\-sis, 452 
practii-al uses, 85 
simplified »a> to compute, 84 
vanahililv measure, 83-85 
Standardized tr«ta 

nmuiniidardircd vs , 274-75 
pubh'.hers, 464-65 
Standards, norms and, 274-76 
Stind ird scores, 85, 2^9-90, 295 
Stiiifonl Achievement lest, 44, 170-71 
btaiilord-Uincl scale, 33-34, 127 
btati'-tical an tlv'ts, test n-si Its, 60-104 
cLi'«ification am! tabuLilion, 01-69 
Irtiiutnc} Lihle, 62-69 
rank order, bl 62 
consideration.', gencril, CO-61 
error, mc-isurcs, 101 03 
ine.a'im mcnl, 101, 102-63 
'ampling, lOI, 103 
technique, 101, 102 
qiunlitatne data, C9-75 
lonctpis ver«us computations, C9-7o 
rcblionsliip, measures, 85-101 
cotflicicni ol correlation, 86-90, 93- 
94,98-100 

co>rclationsliip or concomitant varia- 
tion concept, 85-00 
cxpoctancv tables, 100-01 
rank correlation 95-98 
relia)iihf\ coefhcieni. 101 
scalltrgram data, 90-95 
vihditv coefficient, 101 
tanalftliti orscstter, measures, 81-85 
percentile D, 84-85 
quartile devaation, 81-83 
range, SI 

standard deviation 83-85 
Statistical methods, England and, 31 
Statistical vahdit>, 101 
cumcular vs , 111-13 
Statistics 

fift> questions 429-35 
answers to 459-63 
importance, 60 
in a capsule, 60-61 
Status vahditj, 417 

Slenoui't Test of General Alecnamcal 
Aluhlj. 37 

Stone Rc.isoning Test m Arithmetic, 39, 

1(.8 

Strong ^^ocatlonal Interest Blank, 51 
Student pubhcations, agencies of pubbe 
information, 403-04 
Subject age 291 
Subject quotient, 291 
Simmetncal curves, 262-63 



SUBJECT INDEX 


483 


T'Rcorcs, 295 

TabuhUon, tMt results, 1.2 

frequency table or di'lnbuluui 62-69 
form, 

in.iking, C4-CG 
scatltrgrain, 67-09 
Iwo-waj, 07-69 
Tcarlitr , 

iiinisuremcnt and, purpose, oOt 
rei-ords and rc|)orls In 2dS-tU 
Toichrr$ Cvlltge lluord, o2 

Tcirbing. «<•« In^itriicliiin , 

Tea. lung (inphasis mcaMircmcnt ami, 

d01"OO tfwi> 

Technique, errors ^ 

TcrnJ-M.N-cmT Test ol Mcpt»l AM- 
ity, 110-11. -S7 
Test construction. 139 -Hi 
essaj txaniination, 1 dj' y, 
ev ilualing the lest. lo9 
obj(M.tivc ItsW, 

iucrnativc‘.rc-iK)n«c. HWJ 

nnltliinR „ 

mullipli-chmcc.^.-SO 

simplc-rtcRll, 

c.ns,d.»«0P, 


arrjuK'-'* V, ti 

JScrr‘°>"p^'’o' .len>= ‘'>- 

pP^SS'p'of'rcU/cB, ^ 

sequence m, 

phrifcing 1^7-48 

prehminarj draft. ^^3 

IS mSW'“” 
of -P”“- 

Problem, P;'P“^i;"S,.57 

s:„iruM%,i5-5s 

Suo'^?fe/^Lu„oROO.-. 


Test construction {Coni ) 
trying out the test {Cont ) 
scoring procedure, I^b-SS 
scoring rules, 1^8 
time allowance, 155-56 
Testing piograni, 209 dOO 
adtuulI^te^lng the tests, 225-30 
procedure for, 228- 30 
tune for 22o 27 
who shotJd administer, 227 
considerauons general, 209-12 
CO opcritive, 213 
definite, 214 

graphical representation 247-73 

constructing, suggestions for, 271- 

73 

distributions two or more, representr 
ing, 264 71 

frequency distribution, represeolmg 
2-»8-64 

record of an individual, representmg 
2 o4-58 

value 247-54 o 7 j_.?rin 

norms uses and limitations, 27^3M 
mlelligcnce and acbievcraent, com 
n inng 296-09 , 

mterpreting scores on achievement 

,pS.“fr4 OP M.U.Repce 
OP p.r.oppbt7 
„fSoS'aKnved.core,, 27d- 

sJpdords 8Pd torpis, 274-76 
plan for elementary school, 210 

?3“'i™p‘i=0PCe™ng 24W6 
purpose of determining, 212 14 

'^ISradS^tralors, 240-43 
to teachers, 238-40 

X'ad“™t,ators 24<M3 
to parents or public, 245 
topupiU 237 

toteachere 238^0 
results, appbing 235 3D 

SS‘'?POly-”6 806 .PtorpreMg, 234- 

scorSltete=bi.23M^ 

techniques OMd, 23 W4 

eho-hould'™j,33l>-4‘ 

Xswr,*t, 214-16 
steps 111 Stepnenson, 55 

S";?s5?“^t.slct.=al onalyso.. CP-104 

apphing 23^ tabulation, 61-69 

..Sri^i^Ss Re„cin1,«Ml> 



484 


SUBJECT INDEX 


Test results {Coni ) 

error, measures, 101-03 
measurement, 101, 102-03 
sampling, 101, 103 
technique, 101, 102 
quantilati\e data, 60-75 
concepts vs computations, 60-75 
mean, finding, 7^81 
median, finding 75-78 
mode, finding, 75 
percentiles, computing, 78 
relationship, measures, &>-101 

cotfiicient of correlation, SG-90, 93- 
94, 9S-100 

co-relationship or concomitant \ani« 
tion, concept, 85-00 
expcctanc} tables 100-01 
rank correlation, 95-9S 
reliabilitj coefficient, 101 
scattergram data, 90-95 
validit} coefficient, 101 
^anablllt} or scatter, measures, 81-85 
percentile, D, 84-^ 
quartile deviation, 81-83 
range, 81 

standard deviation, 83-So 
Tests 

achieiemcnt, see Achictcmcnt tests 
administering, 225-30 
alternative-response, see Alternative re- 
sponse tests 

aptitude, see Aptitude tests 
completion, see Completion tc«ts 
construction, see Test construction 
essaj examinations, see E?sa> examina- 
tions 

evaluating, 159-62 
intelligence, see Intelligence teats 
matching see Matching tests 
multiple-choice, see Multiple-choice 
tests 

nonstandardized, see Nonstandardized 
tests 

novel, 422-23 

objective, see Objective tests 
planning, 140-47 
prepanng 147-55 

rearrangement ace Rearrangement tests 

reliabiUtj , see Reliabilitj 

results see Test results 

“cale distinguished from, 24 

scores, see Scores 

Bcormg see Scoring the tests 

selecting appropriate, 214-24 

simple-recalJ see SimpJe-recall tests 

standardized, see Standardized tests 

time allowance for, 155-56 

tiding out, see TVjing out tests 

usabditj , see Usabditj 

vahditj, see Vahdity 

Teds and Measurements in High School In- 
strudion Ruch and Stoddard, 54 

Tests m English Fundamentals Gram- 
mar 176-77 

Tests of General Educational Develon- 
ment (GED), 422 


Tests on Evervdav Problems m Science 
Unit III, 177-78 

Tlicmatic Apperception Test (TAT), 421 
Theory and Fraclice of Psychological Test- 
ing, hreeman, 55 

Theory of Menial Tests, Gullikscn, 55 
Third Sfental Measurements Yearhool 
<1919), 55 

Thorndike Handwriting Scale, 39 
Thomdike-McCall Rending Test, 279, 295 
Ihrcc-Year fetudj of (Commission on 
Teacher Lducntion, 374 
Time allowance for tost, 155-56 
Trait vnnabilit>, 352-53 
Iraxltr Silent Reading Test, Word mean- 
ing, 1^ 

Trjing out the test, 155-58 
answer kejs, 158 

conditions for, insuring normal, 155 
scoring procedure, 156-58 
'coring rules, 158 
time allowance, 155-56 
Two-wa> frequency table, 07-69 
Tjpewntcr graphs, 203 

Ungrouped senes, 01 

Unit Scales of Attainment m Foods and 
Household Management, 183 
Umvcr«itv of Chicago High School Sj’Stem 
of rejxirting, 411-13 
Usahilitj 

administration, ca«e, 127-28 
application, ease, 129-30 
cost, 130 

C'saj examination, 195, 190 
interpretation, ease, 129-30 
meaning, 127 

meclianical make-up, 130-31 
qualitj of satisfactorj measuring instru- 
ment, 127-31 
recent tendencies, 57-58 
Fconng, ease, 128-29 
O'tiiizing Human Talad, Davos, 55 

Vahdity 

achievement tests, 111-21 
criticism', 113-14 
cumcular vs staliatical, 111-13 
direct vs indirect methods, 115-17 
item anahsis, 117-19 
standard tests, judgmg 119-21 
T3 ler s suggestions, 114-15 
coefficient, 101 

considerations, general, 107-OS 
cumcular, 101 
statistical vs , 111-13 
esaaj examination, 193, 196-97 
intelligence tests, 108-11 
individual vs group 109-11 
meaning of mtelligence, 108 
Tetman criteria, 108-^ 
meaning, 107 
objective tests, 164-67 
quahty of sati'factorj measuring instru- 
ment, 107-21 



SUBJECT INDEX 


485 


Vftluht> (Conf ) 

Bt-itHticxI, 101 
currinilar Ncr«ti3, 
t\pc^ 41&-17 
Vanalnlit^ 

cv>ncept of tP«t (fall, 00, 71-72 
human »pc Human \ariab>hty 
mcaninp SI 
measures, 81 -So 
pcrccntil.*, D, 81-So 
quartilc deviation, 81-81 
ranRp, 81 

standard dpantion, S3^ 
\ocal)ulara of, 72-73 
Variations, concomitant, «fc Concomitint 
aanation 

llsitaticu, school, public information, 413 


Watson-Glaser Critical Thinkuig Ap* 
praisal, 422-23 

Wechsler-BeUevue InteUigence Scales, 55, 
215 

Wcchsler fntelbgence Scale for Children 
{\\ISC), 55 418 422 
Wesley Test in Political Terms, 182 
WTiolistic approach 420 
Woodworth Personal Data Sheet, 49 
W'orld success criterion of intelligence, 109 
World War I 35-37, 48 
W'orld W^ar II, 37 55. 363 


X-0 Test, 49 


Z-scores, 85, 2S9-90 



